Data model database

ABSTRACT

Systems and methods utilize a data model database which includes a plurality of symbol data types. Each of the plurality of symbol data types have one or more symbol data fields. The data model database further includes a plurality of concrete data types. Each of the concrete data types have one or more language-agnostic concrete fields associated with each of the one or more symbol data fields. Each of the one or more language-agnostic concrete fields apply one or more concrete constraints to each of the corresponding symbol data fields. The data model database further includes a plurality of carrier data types. The plurality of carrier data types having one or more language-specific carrier fields associated with each of the one or more language-agnostic concrete fields. Each of the one or more language-specific carrier fields apply one or more carrier constraints to each of the corresponding language-agnostic concrete fields.

TECHNICAL FIELD

The present disclosure relates generally to systems and methods formanaging data types, and more particularly concerns a data modeldatabase for cross-comparison of different models, languages, andformats.

BACKGROUND

Large-scale software efforts often use structured data in a variety offormats, described using a variety of languages including, e.g., Yang,Tosca, Heat, JavaScript Object Notation (JSON), XSD, relational datadefinition language (DDL) definitions, and others. More, in suchenvironments, Application Programming Interfaces (APIs) and dataartifacts are often described according to various schemas (e.g.,Swagger, a JSON Schema) and relational DDLs. Coordinating internal andexternal software development efforts to ensure interoperabilityrequires that the APIs and data artifacts of the various efforts can becollected and communicated among the relevant parties. This isparticularly relevant in modern virtualized networks and/or networksusing software defined networking (SDN) where developers leveragemodel-driven networking across enterprises. For example, work done usingthe AT&T Enhanced Control, Orchestration, Management and Policy platform(ECOMP) and related Open Network Automation Platform (ONAP) frequentlyinvolves structured data in many formats and languages, despite the factthat any portion of information may have multiple uses or dependenciesbeyond its carrier configuration.

A common approach to setting standards for common data models andformats is to create a “data dictionary” which describes properties suchas the meaning of attribute names (e.g. Timestamp always means atimestamp), and may hold a collection of files that describe standarddata formats and models as record layouts or data layouts. Conventionaldata dictionaries suffer from a variety of limitations that severelylimit their applicability in large-scale software efforts and shareddevelopment environments. Routine use of data dictionaries involvesmanual entry of information or import by filename, followed bycross-referencing between data dictionaries associated with disparatemodels. Versioning is extremely difficult to perform in real-time wheremultiple developers or entities are engaged in simultaneous efforts.Changes are difficult to propagate and even where a particular datadictionary is updated timely related fields in the same or differentdata dictionaries may not be updated. While some data dictionariesfacilitate the use of anonymous types—data types which are notexplicitly named—the reuse of common data types between models islaborious and potentially risky for developers.

SUMMARY

The needs existing in the field are addressed by the present disclosure,which relates to systems, methods and computer useable media fordevelopment and use of a data-model database facilitating cross-model,cross-language, and cross-format comparison and utilization of datatypes and related information.

In an embodiment, at least one non-transitory computer readable mediumis configured to store a data model database comprising: a plurality ofsymbol data types, each of the plurality of symbol data types having oneor more symbol data fields; a plurality of concrete data types, each ofthe concrete data types having one or more language-agnostic concretefields associated with each of the one or more symbol data fields, eachof the one or more language-agnostic concrete fields applying one ormore concrete constraints to each of the corresponding symbol datafields; and a plurality of carrier data types, the plurality of carrierdata types having one or more language-specific carrier fieldsassociated with each of the one or more language-agnostic concretefields, each of the one or more language-specific carrier fieldsapplying one or more carrier constraints to each of the correspondinglanguage-agnostic concrete fields.

In another embodiment, a method comprises identifying a carrier typebased on language-specific carrier data from a first system;identifying, in a data model database, a language-agnostic concrete typeassociated with the language-specific carrier type; identifying a symboldata type associated with the language-agnostic concrete type; andconverting the language-specific carrier data to a secondlanguage-specific carrier type based on the symbol data type.

In another embodiment, a method comprises receiving a request to deletea data type collection, wherein the data type collection is reflected ina data model database, and wherein the data type collection includes oneor more data types, the one or more data types including at least one ofa symbol data type, a language-agnostic concrete type, or alanguage-specific carrier data type; determining a dependency chain forthe data type collection using the data model database; and indicatingthe data type collection as delete safe or not delete safe, wherein thedata type collection is marked as delete safe if the dependency chainindicates that remaining data types in the data model database do notinclude references to the one or more data types of the data typecollection, and wherein the data type collection is marked not deletesafe if the dependency chain indicates that one or more of the remainingdata types in the data model database include references to the one ormore data types of the data type collection.

These summary aspects are intended for purposes of example only andshould be construed as non-limiting as alternative and complementaryaspects will be apparent based on the Detailed Description hereafter.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating the architecture of an enhancedcontrol, orchestration, management and policy platform in which anembodiment of a control loop automation management platform may beimplemented. The embodiment may be implemented in two types ofarchitectures, namely centralized or distributed.

FIG. 2 is a block diagram of a platform for enhanced control,orchestration, management and policy in which embodiments of the controlloop automation management platform may be implemented.

FIG. 3 is a block diagram of the service design and creation component,the policy creation component and the analytic application designcomponent of the platform for enhanced control, orchestration,management and policy.

FIG. 4 is a block diagram of the dashboard and active and availableinventory module of the platform for enhanced control, orchestration,management and policy.

FIG. 5 is a block diagram of the master service orchestrator componentand the data collection, analytics and events component of the platformfor enhanced control, orchestration, management and policy.

FIG. 6 is a block diagram of the components for the controllers of theplatform for enhanced control, orchestration, management and policy.

FIG. 7A is a block diagram of an embodiment of a data model databasedisclosed herein.

FIG. 7B is an example expansion of a data type in a data model databasedisclosed herein.

FIG. 7C is an example data type taxonomy of a data model databasedisclosed herein.

FIG. 8 is an example entity relationship diagram for a data modeldatabase disclosed herein.

FIG. 9 is a flowchart of an example methodology using a data modeldatabase disclosed herein.

FIG. 10 is a flowchart of another example methodology using a data modeldatabase disclosed herein.

DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS

As described above, routine data dictionaries fail to establishauthoritative control data types in software environments involvingmultiple entities. An unconventional data model database is disclosed tomanage the complex data structures and interrelationships of models insuch environments.

A simple example of composition challenges can be illustrated with amasked IPv4 address, which consists of two parts, the address and themask. The IP address is four numbers separated by periods, and could berepresented as, e.g., a string with a particular pattern, an integer,four integer fields, an array of four integers, et cetera. In thisregard, composition can present challenges for consistent use of data.The mask could be an integer with a given range, imposing constraintsand defining the mask by inheritance. A timestamp associated with theIPv4 address could have any number of string, integer, floating point,or other representations. The disclosed data model database not onlycontrols these various representations within a particular format ormodel, but allows for cross-comparison and reuse of the data in avariety of models.

FIGS. 1-6 depict an example environment in which a data model databaseas disclosed herein can be leveraged. Illustrated in FIG. 1 is aschematic of the architecture of an enhanced control, orchestration,management and policy platform, (ECOMP platform 100) that is implementedin a cloud environment. The ECOMP platform 100 includes a design timeframework component 101 and a runtime execution framework 103. The cloudenvironment provides a number of capabilities including real-timeinstantiation of virtual machines (VMs) on commercial hardware whereappropriate; dynamic assignment of application and workloads to VMs;dynamic movement of applications and dependent functions to differentVMs on servers within and across data centers in different geographies(within the limits of physical access tie-downs constraints); anddynamic control of resources made available to applications (CPU,memory, storage). With the use of network function virtualization,network appliances have been transformed into software applications. Inthe integrated cloud environment the dynamic cloud capabilities areapplied to applications—i.e., virtual network functions (VNFs)-thusapplying the benefits of the cloud environment to virtual networkelements. For example, VNFs, such as routers, switches, firewalls, canbe “spun up” on commodity hardware, moved from one data center toanother center dynamically (within the limits of physical accesstie-down constraints) and resources such as CPU, memory and storage canbe dynamically controlled.

The ECOMP platform 100 enables the rapid on-boarding of new services andthe reduction of operating expenses and capital expenses through itsmetadata-driven service design and creation platform and its real-timeoperational management framework—a framework that provides real-time,policy driven automation of management functions. The metadata-drivenservice design and creation capabilities enable services to be definedwith minimal information technology development required thuscontributing to reductions in capital expenses. Real-time operationalmanagement framework provide significant automation of networkmanagement functions enabling the detection and correction of problemsin an automated fashion contributing to reductions in operatingexpenses.

The ECOMP platform 100 enables product/service independent capabilitiesfor design, creation and lifecycle management. The design time frameworkcomponent 101 is an integrated development environment with tools,techniques, and repositories for defining/describing network assets. Thedesign time framework component 101 facilitates re-use models thusimproving efficiency as more models are available for reuse. Assetsinclude models of the cloud environment resources, services andproducts. The models include various process specifications and policies(e.g., rule sets) for controlling behavior and process execution.Process specifications are used by the ECOMP platform 100 toautomatically sequence the instantiation, delivery and lifecyclemanagement aspects of the integrated cloud environment based resources,services, products and the components of the ECOMP platform 100. Thedesign time framework component 101 supports the development of newcapabilities, augmentation of existing capabilities and operationalimprovements throughout the lifecycle of a service. Service design andcreation (SDC), policy, and data collection, analytics and events (DCAE)software development kits (SDKs) allow operations/security, 3rd parties(e.g., vendors), and other experts to continually define/refine newcollection, analytics, and policies (including recipes forcorrective/remedial action) using a design framework portal. Certainprocess specifications (aka ‘recipes’) and policies are geographicallydistributed to many points of use to optimize performance and maximizeautonomous behavior in integrated cloud environment's federated cloudenvironment.

The runtime execution framework 103 executes the rules and policiesdistributed by a design and creation environment. This allows for thedistribution of policy enforcement and templates among various ECOMPmodules (described below). These components advantageously use commonservices that support logging, access control, and data management.

Illustrated in FIG. 2 are the components of an embodiment of the ECOMPplatform 100. The ECOMP platform 100 is provided with threeenvironments. These are the design creation environment 201, theexecution environment 203, and the managed environment 205 shown asshaded areas in FIG. 2.

The ECOMP platform 100 includes an ECOMP Portal 207 that provides designfunctions 209 and operations functions 211. The design functions 209include a service design and creation component 213 and policy creationcomponent 215. The operations functions 211 include analytic applicationdesign component 217 and a dashboard 219. The service design andcreation component 213, the policy creation component 215 and analyticapplication design component 217 are all part of the design creationenvironment 201. The dashboard is part of the execution environment 203.

In addition to the dashboard 219 the execution environment 203 includes:an external data movement and application program interface component,(API component 221); an active and available inventory module, (A&AImodule 223); a master service orchestrator, (MSO 225); a datacollection, analytics and events component, (DCAE module 227);controllers 229; a common services component 231; and arecipe/engineering rules and policy distribution component 233.

The managed environment 205 comprises resources, either hardware orsoftware, that may be categorized as: infrastructure resources—(theCloud resources, e.g., Storage 235, Compute 237); networking resources239 (network connectivity functions & elements); and VNF/applicationresources 241 (the features and capabilities of a software application).

Interacting with the execution environment may be an operations,administration and management controller, (OA&M Controller 243); and anumber of external applications 245 that may include e-services 247,business support system and operational support systems, (BSS/OSSapplication 249), and big data services 251 among others.

Illustrated in FIG. 3 are the subcomponents of the service design andcreation component 213. The service design and creation component 213 isan integrated development environment with tools, techniques andrepositories to define/simulate/certify cloud environment assets as wellas their associated processes and policies. The service design andcreation component 213 may include a design studio subcomponent 301; aresource onboarding subcomponent 303; a certification studiosubcomponent 305; a catalog subcomponent 307. Catalog subcomponent 307may include information about groups such as products 309, services 311,resources 313 and processes 315.

The policy creation component 215 deals with policies, which areconditions and requirements, constraints, attributes, or needs that mustbe provided, maintained, and/or enforced. At a lower level the policycreation component 215 involves machine-readable rules enabling actionsto be taken based on triggers or requests. Policies often considerspecific conditions in effect (both in terms of triggering specificpolicies when conditions are met, and in selecting specific outcomes ofthe evaluated policies appropriate to the conditions). Policies allowrapid updates through easily updating rules, thus updating technicalbehavior of components in which those policies are used, withoutrequiring rewrites of their software code. Policies permit simplermanagement/control of complex mechanisms via abstraction. The policycreation component 215 may include a policy editor 317; policy rulessubcomponent 319; conflict identification subcomponent 321; policystorage subcomponent 323. The policy storage subcomponent 323 mayinclude a library 325 and templates 327.

The policy creation component 215 has a broad scope supportinginfrastructure, product/services, operation automation, andsecurity-related policy rules. These policy rules are defined bymultiple stakeholders, (Network/Service Designers, Operations, Security,customers, etc.). In addition, input from various sources (servicedesign and creation component 213, policy editor 317, customer input,etc.) are collected and rationalized. Therefore, a centralized policycreation environment will be used to validate policies rules, identifyand resolve overlaps and conflicts, and derive policies where needed.The policy creation component 215 is accessible, developed and managedas a common asset, and provides editing tools to allow users to easilycreate or change policy rules. Offline analysis ofperformance/fault/closed-loop action data are used to identifyopportunities to discover new signatures and refine existing signaturesand closed loop operations. Policy translation/derivation functionalityis also included to derive lower level policies from higher levelpolicies. Conflict detection and mitigation are used to detect andresolve policies that may potentially cause conflicts, prior todistribution. Once validated and free of conflicts, policies are placedin an appropriate repository.

After completing initial policy creation or modification to existingpolicies, the policy distribution component 233 sends policies (e.g.,from the repository) to their points of use, in advance of when they areneeded. This distribution is intelligent and precise, such that eachdistributed policy-enabled function automatically receives only thespecific policies which match its needs and scope.

Notifications or events can be used to communicate links/URLs forpolicies to components needing policies, so that components can utilizethose links to fetch particular policies or groups of policies asneeded. Components in some cases may also publish events indicating theyneed new policies, eliciting a response with updated links/URLs. Also,in some cases policies can be given to components indicating they shouldsubscribe to one or more policies, so that they receive updates to thosepolicies automatically as they become available.

The analytic application design component 217 includes an analyticssoftware development kit (SDK 329), and storage 331 for key performanceindicators (KPIs), alarms, operators, etc., as well as storage foranalytic application 333.

As shown in FIG. 4, the dashboard 219 includes a manual actionsubcomponent 401, a reporting subcomponent 403 and a topologyvisualization subcomponent 405. The dashboard 219 provides access todesign, analytics and operational control/administration functions.

The A&AI module 223 is the component that provides real-time views ofthe resources, services, products and their relationships. The viewsprovided by the A&AI module 223 relate data managed by multiple ECOMPplatforms 100, business support systems and operation support systems,(BSS/OSS application 249), and network applications to form a “top tobottom” view ranging from the products customers buy to the resourcesthat form the raw material for creating the products. A&AI module 223not only forms a registry of products, services, and resources, it alsomaintains up-to-date views of the relationships between these inventoryitems. Active and available inventory submodule 409 will manage thesemulti-dimensional relationships in real-time. The A&AI module 223 isprovided with an inventory management submodule 407, an entitlementssubmodule 411 and a resource/service topology submodule 413.

The inventory management submodule 407 maintains real-time inventory andtopology data by being continually updated as changes are made withinthe integrated cloud. It uses graph data technology to storerelationships between inventory items. Graph traversals can then be usedto identify chains of dependencies between items. Data views of the A&AImodule 223 are used by homing logic during real-time service delivery,root cause analysis of problems, impact analysis, capacity management,software license management and many other integrated cloud environmentfunctions.

The inventory and topology data includes resources, service, products,and customer subscriptions, along with topological relationships betweenthem. Relationships captured by A&AI module 223 include “top to bottom”relationships such as those defined in the service design and creationcomponent 213 and when products are composed of services, and servicesare composed of resources. It also includes “side to side” relationshipssuch as end to end connectivity of virtualized functions to form servicechains. A&AI module 223 also keeps track of the span of control of eachcontroller, and is queried by MSO 225 and placement functions toidentify which controller to invoke to perform a given operation.

A&AI module 223 is metadata driven, allowing new inventory item types tobe added dynamically and quickly via catalog definitions, reducing theneed for lengthy development cycles. A&AI module 223 provides thefollowing key requirements:

-   -   Provide accurate and timely views of resource, service, and        product inventory and their relationship to the customer's        subscription;    -   Deliver topologies and graphs;    -   Maintain relationships to other key entities (e.g., location) as        well as non-integrated cloud environment inventory;    -   Maintain the state of active, available and assigned inventory        within the ECOMP platform 100;    -   Allow introduction of new types of Resources, Services, and        Products without a software development cycle (i.e., be metadata        driven);    -   Be easily accessible and consumable by internal and external        clients;    -   Provide functional APIs that expose invariant services and        models to clients;    -   Provide highly available and reliable functions and APIs capable        of operating as generic cloud workloads that can be placed        arbitrarily within the cloud infrastructure capable of        supporting those workloads;    -   Scale incrementally as volumes in the ECOMP platform 100 and        cloud infrastructure scales;    -   Perform to the requirements of clients, with quick response        times and high throughput;    -   Enable vendor product and technology swap-outs over time, e.g.,        migration to a new technology for data storage or migration to a        new vendor for MSO 225 or Controllers 229;    -   Enable dynamic placement functions to determine which workloads        are assigned to specific components of the ECOMP platform 100        (i.e., Controllers 229 or VNFs) for optimal performance and        utilization efficiency; and    -   Identify the controllers 229 to be used for any particular        request.

A&AI module 223 also performs a number of administrative functions.Given the model driven basis of the ECOMP platform 100, metadata modelsfor the various catalog items are stored, updated, applied and versioneddynamically as needed without taking the system down for maintenance.Given the distributed nature of the A&AI module 223 as well as therelationships, with other components of the ECOMP platform 100, auditsare periodically run to assure that the A&AI module 223 is in sync withthe inventory masters such as controllers 229 and MSO 225. Adaptersallow the A&AI module 223 to interoperate with non-integrated cloudenvironment systems as well as 3rd party cloud providers via evolvingcloud standards.

Consistent with other applications of the ECOMP platform 100, the A&AImodule 223 produces canned and ad-hoc reports, integrates with thedashboard 219, publishes notifications other components of the ECOMPplatform 100 can subscribe to, and performs logging consistent withconfigurable framework constraints.

The primary function of MSO 225 is the automation of end-to-end serviceinstance provisioning activities. As shown in FIG. 5, MSO 225 includes arequest handler 501, an orchestration engine 503, adapters 505, andservice catalog service recipes 507. MSO provides an interface toorchestrate delivery of integrated cloud environment services. Ingeneral, orchestration can be viewed as the definition and execution ofworkflows or processes to manage the completion of a task. The abilityto graphically design and modify a workflow process is the keydifferentiator between an orchestrated process and a standard compiledset of procedural code. Orchestration provides adaptability and improvedtime-to-market due to the ease of definition and change without the needfor a development engagement. As such, it is a primary driver offlexibility in the architecture. Interoperating with policies, thecombination provides a basis for the definition of a flexible processthat can be guided by business and technical policies and driven byprocess designers.

Orchestration exists throughout the integrated cloud environmentarchitecture and is not be limited to the constraints implied by theterm “workflow” as it typically implies some degree of humanintervention. Orchestration in integrated cloud environment will notinvolve human intervention/decision/guidance in the vast majority ofcases. The human involvement in orchestration is typically performed upfront in the design process although there may be processes that willrequire intervention or alternate action such as exception or falloutprocessing.

To support the large number of Orchestration requests, the orchestrationengine 503 will be exposed as a reusable service. With this approach,any component of the architecture can execute process recipes.Orchestration services will be capable of consuming a process recipe andexecuting against it to completion. The Service model maintainsconsistency and reusability across all orchestration activities andensures consistent methods, structure and version of the workflowexecution environment.

As shown in FIG. 5, DCAE module 227 includes an analytic applicationsmodule 509, streaming framework 511, an events pub/sub 513, real-timecollectors 515, APIs 517 and batch collector 519. In the integratedcloud environment virtualized functions across various layers offunctionality are expected to be instantiated in a significantly dynamicmanner that requires the ability to provide real-time responses toactionable events from virtualized resources, applications, as well asrequests from customers, carrier partners and other providers. In orderto engineer, plan, bill and assure these dynamic services, DCAE module227 within the framework of the ECOMP platform 100 gathers keyperformance, usage, telemetry and events from the dynamic, multi-vendorvirtualized infrastructure in order to compute various analytics andrespond with appropriate actions based on any observed anomalies orsignificant events. These significant events include application eventsthat lead to resource scaling, configuration changes, and otheractivities as well as faults and performance degradations requiringhealing. The collected data and computed analytics are stored forpersistence as well as use by other applications for business andoperations (e.g., billing, ticketing). More importantly, the DCAE module227 has to perform a lot of these functions in real-time.

DCAE module 227 provides real-time collectors 515 necessary to collectthe instrumentation made available in the integrated cloudinfrastructure. The scope of the data collection includes all of thephysical and virtual elements (compute, storage and network) in theintegrated cloud infrastructure. The collection includes the types ofevents data necessary to monitor the health of the managed environment,the types of data to compute the key performance and capacity indicatorsnecessary for elastic management of the resources, the types of granulardata (e.g., flow, session & call records) needed for detecting network &service conditions, etc. The collection will support both real-timestreaming as well as batch methods of data collection.

DCAE module 227 needs to support a variety of applications and use casesranging from real-time applications that have stringent latencyrequirements to other analytic applications that have a need to processa range of unstructured and structured data. DCAE module 227 needs tosupport all of these needs and must do so in a way that allows forincorporating new storage technologies as they become available. Thismay be done by encapsulating data access via APIs and minimizingapplication knowledge of the specific technology implementations.

Given the scope of requirements around the volume, velocity and varietyof data that DCAE module 227 needs to support, the storage may usetechnologies that Big Data has to offer, such as support for NOSQLtechnologies, including in-memory repositories, and support for raw,structured, unstructured and semi-structured data. While there may bedetailed data retained at the edge layer of DCAE module 227 for detailedanalysis and trouble-shooting, applications may optimize the use ofbandwidth and storage resources by ensuring they propagate only therequired data (reduced, transformed, aggregated, etc.) for otheranalyses.

The DCAE module 227 includes an analytic framework which is anenvironment that allows for development of real-time applications (e.g.,analytics, anomaly detection, capacity monitoring, congestionmonitoring, alarm correlation etc.) as well as other non-real-timeapplications (e.g., analytics, forwarding synthesized or aggregated ortransformed data to Big Data stores and applications); the intent is tostructure the environment that allows for agile introduction ofapplications from various providers (Labs, IT, vendors, etc.). Theframework supports the ability to process both a real-time stream ofdata as well as data collected via traditional batch methods. Theanalytic framework supports methods that allow developers to composeapplications that process data from multiple streams and sources.Analytic applications are developed by various organizations, however,they all run in the DCAE module 227 and are managed by a DCAE controller(not shown). These applications are micro-services developed by a broadcommunity and adhere to the standards of the ECOMP platform 100.

The following list provides examples of types of applications that canbe built on top of DCAE module 227 and that depend on the timelycollection of detailed data and events by DCAE module 227. Analyticsapplications will be the most common applications that are processingthe collected data and deriving interesting metrics or analytics for useby other applications or operations. These analytics range from verysimple ones (from a single source of data) that compute usage,utilization, latency, etc. to very complex ones that detect specificconditions based on data collected from various sources. The analyticscould be capacity indicators used to adjust resources or could beperformance indicators pointing to anomalous conditions requiringresponse. The Fault/Event Correlation application is a key applicationthat processes events and thresholds published by managed resources orother applications that detect specific conditions. Based on definedrules, policies, known signatures and other knowledge about the networkor service behavior, this application would determine root cause forvarious conditions and notify interested applications and operations. Aperformance surveillance and visualization application provides a windowto operations notifying them of network and service conditions. Thenotifications could include outages and impacted services or customersbased on various dimensions of interest to Operations. They providevisual aids ranging from geographic dashboards to virtual informationmodel browsers to detailed drilidown to specific service or customerimpacts. The capacity planning application provides planners andengineers the ability to adjust forecasts based on observed demands aswell as plan specific capacity augments at various levels, e.g., networkfunctions virtualization infrastructure (NFVI) level (technical plant,racks, clusters, etc.), Network level (bandwidth, circuits, etc.),Service or Customer levels. A testing and trouble-shooting applicationprovides operations the tools to test and trouble-shoot specificconditions. They could range from simple health checks for testingpurposes, to complex service emulations orchestrated for troubleshootingpurposes. In both cases, DCAE module 227 provides the ability to collectthe results of health checks and tests that are conducted. These checksand tests could be done on an ongoing basis, scheduled or conducted ondemand. Some components of integrated cloud environment may expose newtargets for security threats. Orchestration and control, decoupledhardware and software, and commodity hardware may be more susceptible toattack than proprietary hardware. However, software defined networks(SDN) and virtual networks also offer an opportunity for collecting arich set of data for security analytics applications to detect anomaliesthat signal a security threat, such as distributed denial of service(DDoS) attack, and automatically trigger mitigating action. Theapplications that are listed above are by no means exhaustive and theopen architecture of DCAE module 227 will lend itself to integration ofapplication capabilities over time from various sources and providers.

Illustrated in FIG. 6 are the components of the controllers 229.Controllers 229 include an engineering rules and inventories module 601,a service logic interpreter module 603; a network adapter module 605,and an application adapter module 607. Controllers are applicationswhich are intimate with cloud and network services and execute theconfiguration, real-time policies, and control the state of distributedcomponents and services.

ECOMP platform 100, as described in FIGS. 1-6, can be modified foralternative implementations of a “network operating system” forsoftware-defined networks, such as the Open Networking AutomationPlatform (ONAP). ONAP is an open-source architecture and as suchpresents additional complexity and programming variance into theenvironment.

Whether in ECOMP, ONAP, or other environments, development forsoftware-defined networks presents challenges relating to the use ofdisparate languages and models. The distributed nature and mixture ofcommodity, enterprise, and other clouds makes cross-platformcollaboration and integration, including, e.g., use and reuse ofinformation and data types, relationship and dependency management, anddata graph generation impossible with existing data dictionaries orconverters. Laborious manual management of cross-references is alsolimited by the exposure and comprehension of developers and cannotauto-update to modify relationships or ensure that relationships areproperly updated before data or data types are deleted.

FIG. 7A illustrates an example data model database 700. Data modeldatabase 700 illustrates an example embodiment of a database configuredto overcome the limitations of well-understood data dictionaries andsimilar solutions. Data model database 700 includes data representationtables 710 and relationship tables 750.

Data representation tables 710 include data types 712. Data types 712are a table of data types representing specific data types of thedatabase. This includes, in various embodiments (but is not limited to)the name of the data type, the classification of the data type(described herein), any parent data type(s) for which the type issubclassed, a specification as to whether the data type is a container(and if so the container type), user-level metadata regarding the datatype (e.g., comments), auditing metadata about the data type (e.g.,creation time, last update time, identification of creator andupdaters), parsing metadata about the data type (e.g., to assist inlossless recreation of the data type in its original language), andconstraints on the data type.

Data classification in the disclosed data model database 700 accordswith an unconventional arrangement. A given piece of data has a“concept” of what is represented, but is represented according tovarious languages and formats. Returning to the earlier example of anIPv4 address, different languages provide myriad ways apiece torepresent an address. For example, a JSON schema can include a “native”representation for an IPv4 address in the form of a string in adotted-quad notation. A JSON schema string layout may require that thisbe enclosed in quotes and encoded in, e.g., Unicode. A similararrangement can be provided in YAML, but YAML may not require the use ofquotes. Postgres has two native types for IPv4 (and IPv6), called inetand cidr (although the latter is intended to represent networks). Thespecific representation is opaque but Postgres possesses several meansfor communicating these data types, which can include, e.g., formattedstring or binary representations. An IPv4 address may be represented asan unsigned 32-bit integer (Uint32), which can be represented as asequence of 4 byes in little-endian order or a string of digits in anencoding format.

To manage these various alternatives, data model database 700 utilizesclassifications of symbol, concrete, and carrier types. For ease ofunderstanding, the following formatting conventions are observed whendescribing type names:

-   -   A symbol type name is in italics, for example, IPv4 or        timestamp;    -   a concrete type has only its first letter capitalized, for        example, String or Integer;    -   a carrier type name has the format <system>-<name>, for example        JSON-string or Postgres-bigint; and    -   a regular type (e.g., composite, as described herein) name does        not follow the above conventions, but is non-italic, first        letter lowercase, and includes no hyphens, for example, maskedIP        or routing_table.        It is noted that this formatting is provided for illustrative        purposes only, and forms no portion of the classification,        formatting, syntax, or other functional portion of technologies        disclosed herein.

A symbol data type describes the “concept” of the data type in aplatform-agnostic format, and can be represented according to multipleconcrete types. The symbol type abstracts away details of representationand represents a unit of data, such as an integer, timestamp, IPaddress, or others, with an exact representation deferred until moredetails are needed for a particular use. A symbol type facilitatessubstitutions, conversions, and cross-comparisons. A concrete type is adata type that has a specific matching type in some target schema.Examples of concrete types include (but are not necessarily limited to)a string, subject to constraints, or an integer, subject to constraints.A carrier type is a system-specific type, representing the symbol and/orconcrete types according to a specific schema. Carrier types, such as,e.g., YAML-string, JSON-string, Postgres-text, can be encoded accordingto various techniques (e.g., Unicode or ASCII).

While mapping the details of carrier types is important for, e.g., datatranslations, the extensive details make mappings complex. To reducecomplexity, a concrete type layer is not tied to the specificrepresentation system of the carrier type layer. Symbol types resolve toconcrete types and concrete types resolve to carrier types.

FIG. 7B illustrates the earlier example of an IPv4 addresses accordingto this classification. The symbol type is the “concept” of an IPv4address. Carrier types include, but are not limited to, a JSON string orSQL text type (as implementations of the string concrete type) and anXML integer or SQL bigint (as implementations of the integer concretetype).

Further aspects of the classification system utilized in data modeldatabase 700 involve composite types, which are data types composed fromone or more other types. A struct composite type includes one or morefields of other types, and a container type is a container for one ormore other types (e.g., list, set, bag, map). FIG. 7C illustrates anexample taxonomy including the composite types.

In contrast to composite types, a basic type contains no other types.Basic types include symbol, concrete, and carrier types, or can bederived from a basic type by adding constraints, which can be stored in,e.g., value constraint table 722, container constraint table 724, fieldpattern constraint table 726, and/or enumeration constraint table 728.For example, an orchestration_status type can be derived from a Stringby adding a constraint that any instance take a value selected frompending-create, created, active, pending-delete, or deleted. Compositetypes can have mixture of symbol, concrete, and carrier elements.

A derived type inherits the properties of a parent type, and addsconstraints and/or fields to the parent type. Since a derived type is atleast as constrained as the parent type, and contains at least theparent's fields, a record of the parent type can be formed by discardingextra fields of the child type.

Inheritance cannot change the basic nature of the data type: if theparent is basic/struct/container, the child must also bebasic/struct/container. Therefore, a type that inherits from a basictype or a container type cannot add fields. A type that inherits from abasic type can add constraints to the basic types, e.g. facets, asdiscussed in relation to constraint tables and other portions herein. Atype that inherits from a struct type can add constraints to the patternof fields that can exist.

In some modeling languages (e.g., XSD) it is possible to add constraintsto deeply nested fields. In the context of data model database 700, adata type d′ that inherits from data type d can override a field of d bychanging the type f of field f with f′, which is derived from f eitherdirectly or transitively.

A container type can have constraints related to the container (e.g.,size constraints). A container type can also replace contained typeswith descendants of the contained types. All added constraints must becompatible with the constrained things (e.g., container constraints forcontainers as stored in container constraints table 724) andtype-specific constraints for basic types.

A type that inherits from a symbol basic type is also symbol and basic,and a type that inherits from a concrete basic type is also concrete andbasic. So for example, an orchestration status type can be defined byinheritance from concrete type String, and therefore is also concrete.

A symbol composite type is one which is (transitively) composed of atleast one symbol basic type. That is: a data type which directly dependson (through fields or contained types) at least one symbol basic type isa symbol composite type; and a data type that depends on (through fieldsor contained types) a symbol composite type is a symbol composite type.

A pure symbol composite type is transitively composed of only symboltypes. A pure concrete composite (or pure concrete) type is transitivelycomposed of only concrete types. Similarly, a pure carrier composite (orpure carrier) is transitively composed of only carrier types. A strictsymbol type is a basic symbol type with no constraints, and a strictconcrete type is a basic concrete type with no constraints.

For example, consider the definition of a d_mixed type:

d_mixed:

alpha: String

beta: integer

Data type d_mixed is a symbol composite type because it contains asymbol type, integer, as one of its basic types. However it is neitherpure symbol nor pure concrete.

For another example, consider d_symbol:

d_symbol:

alpha: string

beta: integer

Type d_symbol is pure symbol because all of its basic data types aresymbol.

In another example, consider d_concrete:

d_concrete:

alpha: String

beta: Integer

Type d_concrete is pure concrete because all of its basic data types areconcrete.

In another example, consider d_complex:

d_complex:

gamma: d_symbol

delta: d_concrete

Type d_complex is transitively a symbol composite type because one ofits components is d_symbol, which is a symbol composite type. Howeverd_complex is not pure symbol because it is not composed only of puresymbol types.

A symbol basic type can be a representation of a concrete basic type(integer), or it can be a representation of a larger construction (e.g.,routing_table). A concrete representation of a routing_table might be acomposite type. One symbol basic type can be derived from another byadding constraints.

The symbol type integer represents any integer type. However, manycarrier types (e.g., Postgres-integer, Postgres-smallint) have arestricted range. In many cases, a field with a symbol type (e.g.,integer population) should not be stored on a carrier type with a smallrange (e.g., Postgres-smallint). If the data type author wishes to makethe required range of field population explicit, they can use a symboltype which is derived from integer by adding constraints. Theseconstraints can be interpreted as the range of the integers (or othersymbol types) stored by the constrained symbol type.

In an example, a type bigint can be derived from integer with theconstraint that its range is in [−2⁶³,2⁶³−1]. By stating this populationis of type bigint, the data type authors can place constraints on anycarrier type used to represent the field population (e.g., that itsrange be at least [−2⁶³,2⁶³−1]). Similar types such as longint orshortint can be derived from integer, with constraints limitingrespective ranges to different values (e.g., bounding greater_equal orless equal range values).

The constraints used for a symbol type must match the symbol type. Forexample, a constraint greater_equal: −2⁶³ matches the symbol typeinteger. Symbol types that have direct mappings to concrete types (andtherefore have carrier types) generally come from a small collection ofwell-known types: integer, string, float, bool, et cetera, and havewell-established constraints. However not all symbol types with matchingconcrete types have well-established constraints, and the authors of adata type containing the field population might choose to use the symboltype integer. In these cases, the burden is placed on the user whichmaps symbol types to concrete, and then carrier, types to chooseappropriate substitution mappings.

Symbol types can have a complex hierarchy of mappings from the mostgeneric symbol type (i.e., a strict symbol type) to specificrepresentations in a carrier types. In an example, an IP_address typecan represent both IPv4 and IPv6 addresses. If the authors of a datamodel wish to be specific about the type of IP address, they can use thederived symbol types, IPv4_address or IPv6_address. These types havemany representations, one of which is the concrete type Cidr. Thereforeboth IPv4_address and IPv6 address have mappings to Cidr types withappropriate constraints (if available on the Cidr type). The symbol typeIP_address also has a mapping to the concrete type Cidr, and all of theCidr types have mappings to the carrier type Postgres-cidr. Howeverthere are many possible representations of IP_address, one of which isip_address, which is a composite struct type consisting of a bool and anarray of 16 bytes. The symbol types bool and byte have further mappings.

Relationship tables 750 can describe a variety of relationship types.For example, the symbol types IPv4_address and IPv6_address are derivedfrom IP_address. The mappings to Cidr and ip_address use aConcrete_type_of relationship (discussed herein). The mappings from Cidrto Postgrescidr use the Carrier_type_of relationship, discussed herein.

The Carrier_type_of relationship, which can be one or more of tablesstored in relationship tables 750, defines which carrier types canrepresent a concrete type, and maps a concrete type to a carrier type:

Carrier_type_of(concrete_type, carrier_type)

For example, the above example describes the relationship:

Carrier, type_of(Cidr, Postgres-cidr)

In embodiments, the number of carrier types and concrete types may besmall, and the concrete_type_of relationship may be small. Thereforeconcrete and carrier types can be carefully crafted and mapped via theCarrier_type_of relationship.

However, it is possible to extend the Carrier_type_of relationship witha relationship that describes whether one data type can be representedby another data type, referred herein as the “rep_by” relationship. Itis understood this and other relationships referred herein can be givenother names or notations without departing from the scope or spirit ofthe innovation. Rep_by can relate a concrete_type to a concrete type, ora carrier_type to a carrier type:

rep_by (concrete_type, concrete_type)

rep_by (carrier_type, carrier_type)

The idea behind rep_by(D1, D2) is that D1 and D2 have “compatible”representations (e.g., both are strings, both are integers) and everyvalued in D1 exists in D2. For example, let Int32 be a signed 32-bitinteger, Uint32 be an unsigned 32-bit integer, and Int64 be a signed64-bit integer. It follows that:

rep_by (Int32, Int64)

rep_by (Uint32, Int64)

However it may not follow that either rep_by (Int32, Uint32) or rep_by(Uint32, Int32).

The rep_by relationship can be used to extend the carrier_type_ofrelationship by using the following two inference rules:

If carrier_type_of(D1, X-D2) and rep_by(X-D2, X-D3), thencarrier_type_of(D1, X-D3); and

If rep_by(D1, D2) and carrier_type_of(D2, X-D2) then carrier_type_of(D1, X-D2).

An example of carrier_type_of inference using concrete types Int32,Int64, Uint32, and carrier types Postgres-integer and Postgres-bigint isprovided where:

Int32 and UInt32 is rep_by Int64;

Postgres-integer is Carrier_type_of Int32; and

Postgres-integer is rep_by Postgres-bigint, a Carrier_type_of Int64, andan inferred carrier_type_of Int32 and Uint32.

The natural carrier_type of Int32 within Postgres is Postgres-integer.However Postgres-bigint is a larger type (as stated byrep_by(Postgres-integer, Postgres-bigint)), so it can be inferredcarrier_type_of(Int32, Postgres-integer). This inferred relationship canbe useful if the user desires, for example, that all integer types inPostgres be uniform. Conversely, there is no natural carrier_type ofUint32 in Postgres, so the data type database maintainer might notcreate any mappings from Uint32 to a data type in Postgres. However,rep_by(Uint32, Int64) and so it can be inferred that the carrier_type ofInt64, Postgres-bigint, can serve as a carrier_type of Uint32.

Given that rep_by is useful for inferring additional carrier_type_ofrelationships, it would be useful to automatically infer rep_byrelationships also. The definition of rep_by makes one kind of inferenceeasy, as rep_by is transitive:

If rep_by(D1, D2) and rep_by(D2, D3), then rep_by(D1, D3).

It can be observed that if concrete type C1 is derived_from C2, then C1has constraints in addition to those which constraint C2, and so every cin C1 is also in C2. Therefore:

If C1 and C2 are basic concrete, and C1 is derived_from C2, thenrep_by(C1, C2) This derived_from inference rule is used to infer thatthe carrier_type of the constrained Cidr concrete types isPostgres-cidr. The relationship carrier_type_of(Cidr, Postgres-cidr)would exist in the data type database, and the carrier_type_ofrelationship can be inferred automatically for all concrete typesderived_from Cidr.

It is possible to develop automated inference rules for carrier andconcrete types, based on reasoning on the properties of the types,expressed as constraints. One framework for this reasoning is to use twotypes of constraints, which can be stored in the constraint tables(e.g., value constraints 722, container constraints 724, field patternconstraints 726, and enumeration constraints 728) of data model database700. First, range constraints describe the range of values that can berepresented in the type. For example, integer greater than and less thanconstraints are range constraints. Next encoding constraints describehow “fine grained” the encoding of the data is. For example, Unicode isa larger encoding that ASCII.

For a data type D, let r(D) be its range constraints and e(D) be itsencoding constraints. So: r(D1)<=r(D2) if the range of D1 is smallerthan or equal to the range of D2 (similarly for e(D)). From this theinference meta-rule can be developed:

If r(D1)<=r(D2) and e(D1)<=e(D2), then rep_by(D1, D2).

An algorithm for computing r(D1)<=r(D2) or e(D1)<=e(D2) is notgeneralized as these rules are domain specific. However, these rules canbe automated and applied in specific domains.

For an example, consider the collection of Postgres numeric types(except for auto-increment types). Such numeric types can be describedaccording to a name, storage size, description, and range. The types caninclude smallint (2 byte), integer (4 byte), bigint (8 bytes), decimal(variable size), numeric (variable size), real (4 byte), and doubleprecision (8 byte). Based on range and encoding properties, a skeletonof the rep-by relationships among the Postgres numeric types can bederived, which shows that smallint is rep_by integer is rep_by bigint isrep_by decimal or numeric, and that real is rep_by double precision isrep_by decimal and numeric, and that decimal is rep_by numeric andnumeric is rep_by decimal. This collection of rep_by relationships isthe transitive closure of relationships for these types.

There is also a concrete_type_of relationship. As the examples suggest,the concrete_type_of relationship maps a basic symbol type to anothertype. The target type can be any type in the database, except foranother basic symbol type. That is, valid target types are concretebasic types and composite types (struct and container).

With regard to symbol type substitution, symbol types can be used tohelp determine composite type similarity, and to enable translation fromone data representation to another. A relationship can be definedbetween types:

concrete_type_of (symbol, replacement)

to indicate that replacement is a (more) concrete representation of typesymbol. The type symbol must be a symbol basic type, and the typereplacement cannot be a symbol basic type. For an example, consider thesymbol boolean type. There are several concrete representations,including Boolean (the concrete native Boolean type in, e.g., JSON),boolean_str (derived from String with the constraint that it take a Trueor False value), and boolean_int (derived from Integer with theconstraint that it take a 0 or 1 value). The following threerelationships can be asserted:

concrete_type_of(boolean, Boolean)

concrete_type_of(boolean, boolean_str)

concrete_type_of(boolean, boolean int)

This relationship can be used to create a pure concrete composite typefrom a symbol composite type. For example, given the data type:

Vm_status:

Uptime: Integer

Failed: boolean

Vm_status can be transformed into pure concrete type Vm_status_str bysubstituting symbol type boolean for a concrete type such that booleanis the first position of a concrete_type_of relationship. For example,

Vm_status_str:

Uptime: Integer

Failed: boolean_str

The target of the concrete_type_of relationship does not need to be apure concrete type. However it should be “more concrete” than thesource. For example, consider a symbol data type routing_table. Onerepresentation of a routing table could be, using a simplified data typedescription language:

Routing_table_a:

List:

-   -   subnet: masked IP_address    -   interface: Integer        subnet is of type masked_IP_address, which is a symbol type.        There are many representations of a masked_IP_address. One is        the concrete type Masked_IP_address, which has the carrier_type        postgres-cidr. Another is a type masked_ip which is a pair (IP:        IP_address, mask: Integer). masked_ip is a symbol type because        IP_address is a symbol type with many possible representations.        These relationships can be asserted by:

concrete_type_of(routing_table, routing_table_a)

concrete_type_of(masked_IP_address, Masked_IP_address)

concrete_type_of(masked_IP_address, masked_ip)

As suggested data model database 700 also includes inferredrelationships, which can be explicitly or inferentially stored in one ormore of relationship tables 750. Many relationships (e.g.,concrete_type_of relationships) can be inferred. Let a be a symbol typeand let C be a concrete type such that the relationshipconcrete_type_of(a, C) is asserted. Let a′ be a type that isderived-from a by adding constraints. Let C′ be derived from C by addingthe new constraints of a′ to C′. It can be asserted concrete_type_of(aC′). If C′ does not exist, it can be constructed as long as the newconstraints of a′ can be applied to C.

Relationships can be used to provide expansion trees for particulartypes.

Because the target of the concrete_type_of relationship is only lightlyconstrained, it is possible for various anomalies to occur. For example,the process of replacing a symbol data type with its concrete_type_oftarget might proceed indefinitely. For another example, it might not bepossible to transform a symbol data type into a pure concrete data typeusing only the replacement transformations in concrete_type_of.

To determine whether concrete_type_of substitutions can lead to a validconcrete type, the expansion tree can be developed (expanding symboltypes into concrete types). There are two types of nodes in an expansiontree, component nodes and expansion nodes, and two kinds of edges,expansion edges and component edges. Expansion edges lead from componentnodes to expansion nodes, and component edges lead from expansion nodesto component nodes.

Given composite data type d, let ref set(d) be the set of all data typesdirectly and transitively referenced by d. For example, if d is a structwith fields {foo: e, bar: g, zed: e}, then set ref set(d)={e, g}. If dis a map with key k and value v, then ref set(d)={k, v}. If v is acomposite type with fields {v1, v2}, then v can be replaced by v1 andv2, for example, ref set(d)={k, v1, v2}.

Let direct_ref(d) be the set of data types referenced by d. That is, itcan be asserted that direct_ref(d)={e, g} (or {k, v}) without furtherprocessing. To compute ref set(d), the following algorithm can beapplied:

Set ref_set(d)=direct_ref(d); and

While there is a composite type c in ref_set(d),

set ref_set(d)=ref_set(d) Union direct_ref(c) Minus c.

It is understood that this algorithm is for example purposes only, andothers can be utilized without departing from the scope or spirit of theinnovation.

The expansion tree of symbol data type d, expansion(d), is constructedrecursively by the following rules:

The root, r, of the expansion(d) is a component node labeled with d′;

if a component node c is labeled with d, d is symbol basic, and d occursonly once (i.e., at c) in the root to leaf path from r to c, then foreach rule in concrete_type_of with source d and target d′, create anexpansion node e labeled with d′ and an substitution edge from c to e;and

if an expansion node e is labeled with d, and d is not basic, and occursonly once (i.e., at e) in the root-to-leaf path from r to e, then foreach d′ in ref_set(d), create a component node c with label d′ and acomponent edge from e to c.

The single-occurrence rule ensures that the tree is finite.

In an example, an expansion tree of d1 can have a root, a component nodelabeled with d1. There can be three concrete_type_of rules, so threeexpansion nodes (e.g., having types E1, e2, and e3). E1 can be a basicconcrete type so no further expansion is performed. Data type e2 can bea struct containing basic types D4 and d6. D4 can be concrete, but d6can be symbol. There are no concrete_type_of rules with d6 as thesource, so no further nodes are created. Data type e3 can be a structcontaining basic concrete type D7 and basic symbol type d8. There can betwo concrete_type_of rules with d8 as the source, with targets e4 ande5. Data type e4 can expand to D9 and d8. D9 can be basic concrete, andd8 can appear in the root-to-leaf path, so no further expansion isperformed. Data type e5 can expand into d1 and basic concrete type D7.Type d1 is the root, so no further expansion is performed.

Data model database 700 also facilitates cyclic expansions. A basicsymbol data type d has cyclic expansions if there is some leaf I inexpansion(d) with label d′ such that d′ occurs twice in the root-to-leafpath from r to I. Basic symbol data type d has root-cyclic expansions ifthere is some leaf I in expansion(d) such that I is labeled with d.Every data type d that has cyclic expansions either has root cyclicexpansions, or expansion(d) contain a node labeled with d′ where d′ hasroot-cyclic expansions.

Even if a symbol data type d has cyclic expansions, it might have someexpansions which are not cyclic. The following is an algorithm that canmark concrete_type_of substitution rules as leading to invalid results:

In expansion(d), mark all leaves which appear twice in theirroot-to-leaf path as cyclic. This marking can be performed whilecomputing expansion(d);

Starting from the leaves and working recursively towards the root: anexpansion node is marked cyclic if at least one of its children ismarked cyclic; and a component node is marked cyclic if each of itschildren is marked cyclic.

For example, consider the expansion tree expansion(d1) described above.Leaves d8 and d1 are marked as cyclic because their data type occurstwice in their root-to-leaf path. Moving one level up, e4 and e5 aremarked cyclic because they are expansion nodes and one of their childrenis marked as cyclic. Moving up again, component node d8 is marked cyclicbecause both of its children are marked cyclic. Moving up a last time,expansion node e3 is marked cyclic because at least one of its childrenis. The root node is not marked cyclic because it is a component nodeand it has at least one child not marked as cyclic.

If the root of the expansion tree is not marked cyclic, then there arevalid concrete_type_of substitutions that can lead to an equivalent (andmore concrete) representation. All concrete_type_of rules whichcorrespond to a substitution edge leading from an unmarked componentnode to a marked expansion node must be avoided.

Algorithms can also be established for finding incomplete substitutions.Cyclic expansions can correspond to errors in the concrete_type_ofsubstitution rules. Let acyclic_expansion(d) be the expansion(d) withall subtrees rooted with a node marked cyclic removed. Inacyclic_expansion(d), there may be concrete_type_of substitution rulesthat lead to expansions which contain basic symbol data types that arenot the source of any concrete_type_of rules. While these paths are notincorrect, user intervention may be employed to replace any remainingbasic symbol data types with pure concrete equivalents. An algorithm forfinding incomplete substitutions is similar to that for findingsubstitutions:

In acyclic_expansion(d), mark all leaves which are symbol as incomplete.This marking can be performed while computing expansion(d);

Starting from the leaves and working recursively towards the root: anexpansion node is marked incomplete if at least one of its children ismarked incomplete; and a component node is marked incomplete if each ofits children is marked incomplete.

If a node is not marked incomplete, it can be considered complete. Foran example, consider the tree acyclic_expansion(d1). Leaf d6 is symbol,and therefore incomplete. Its parent, e2, is an expansion node, and istherefore incomplete because one of its children is incomplete. Theparent of e2, d1 is complete because at least of its children iscomplete.

Expansion trees can be used to transform symbol types to pure concretetypes. The expansion tree of symbol basic data type d indicates thepossible choices for converting d into a concrete type. There are twobasic approaches for using the expansion tree. In the first, a choicecan be made for a replacement for d selected from among its childrennodes which are neither marked as cyclic nor incomplete (or,alternatively, not marked cyclic regardless of whether markedincomplete). The replacement for d, e, might have a symbol basic type d′in ref_set(e), but the choice of replacement type for d′ is deferred. Inthe second, a replacement is chosen for each symbol basic data types inthe tree defined by the chosen edges among the expansion edges. This canbe performed via the following algorithm:

Set worklist=[d];

Set chosen_edges={ }

While worklist is not empty:

Extract and remove d′ from the front of worklist;

Choose an expansion edge X that leads to an expansion node e that is notmarked cyclic or incomplete (alternatively, not marked cyclic) and add Xto chosen_edges.

For each component node c that is a child of e, if c is basic symbol,add c to the end of worklist.

In the example expansion tree described above, a basic symbol type d1has two replacements that are neither incomplete nor cyclic, E1 and e3.The first approach would choose either of the two and stop. If e3 ischosen, then eventually the substitution process will need to replaced8, but that choice is deferred until d8 is encountered, in which casethe expansion tree of d8 would be built. The second approach would findall of the replacements. If e3 is chosen to replace d1, then thealgorithm would find that d8 is a basic symbol data type among thecomponent nodes of e3. The algorithm would continue and choose e4 as thereplacement of d8. Since e4 has only concrete basic types in itsref_set, the algorithm stops.

Transformation of a symbol data type to a pure concrete data type relieson the dependency tree (discussed herein). Before a data type can bedeployed, it must be transformed into a pure carrier data type, whichrelies on the dependency tree (discussed herein).

Similarity can be determined using symbol types. Give, for example, twoflow measurements, flow_a and flow_b:

Flow_a:

Source: IPv4_str

Dest: IPv4_str

Packets: Integer

Collection_time: timestamp_float

Flow_b:

SourceIP: IPv4_int

DestIP: IPv4_int

Sum: Integer

Measurement time: timestamp_ieee

Where IPv4_str represents an IPv4 address using a dotted-quad string,IPv4_int uses a 32-bit Uint, timestamp_float represents a timestampusing the seconds since 1970-01-01 00:00:00, and timestamp_ieee uses theIEEE standard for representing a timestamp as a string. The followingrelationships are then in the concrete_type_of table:

concrete_type_of(IPv4, IPv4_str)

concrete_type_of(IPv4, IPv4_int)

concrete_type_of(timestamp, timestamp_float)

concrete_type_of(timestamp, timestamp_ieee)

By substituting symbol for concrete from the concrete_type_ofrelationship, flow_a and flow_b can be transformed into:

Flow_a_sym:

Source: IPv4

Dest: IPv4

Packet s: Integer

Collection_time: timestamp

Flow_b_sym:

SourceIP: IPv4

DestIP: IPv4

Sum: Integer

Measurement time: timestamp

Because flow_a_sym and flow_b_sym are quads with the same collections oftypes, flow_a_sym and flow_b_sym have a high similarity. Thereforeflow_a and flow_b have a high similarity even though they appear to beor may actually be composed of different types.

An algorithm for similarity can involve reverse-matching theconcrete_type_of rules. Termination can be expected because theone-to-one match is basic concrete to basic symbol. Other rules reducethe number of fields in the fully expanded data type tree.

The replacement process can have branching if there is more than onesource for a given target. A constrained field can be an anonymous datatype, which can be used to search for types in the target ofconcrete_type_of. There can be multiple targets, e.g., separate fieldconstraints form data type, or combine them. Another source of multipletargets includes multiple entries with the same target, e.g.,concrete_type_of(a1, c1), concrete_type_of(a1, c2).

With classification and similarity explained, additional informationregarding representations of data model database 700 are set forthbelow.

There can be a variety of container types, with details defined at leastin part by container constraints 724. Information about container types(e.g., list, set, bag, map, and others) can be represented in multipleways. To limit ambiguity, several options are available, including:

Defining containers are as the data_type record; that is, type Foo canbe a list of integers, and field bar can be of type Foo, but field barcannot be a list of integers;

Defining containers at the fields; that is, type Foo can be an integer,but not a list of integers, and field bar can be a list of Foo (or alist of integers directly); or

Containers can be defined both as data types and as properties offields.

Most data modeling languages (e.g. xsd, tosca, json schema) definecontainer types at the fields (with some exception for map types inXSD). This convention is a convenience for the model developer, as thedeveloper can create a field which is, e.g., a list of integers, withoutsearching for a pre-existing definition or defining the list type beforecontinuing. Therefore, specifying containers when defining fields isconvenient for the model developer.

However, saying that field Foo is a list of integers creates ananonymous type. These anonymous types are make searching and similaritymore difficult, as cross-referencing (both by the user and by anyanalysis software) may be required to verify that two anonymous types“mean” the same thing. Furthermore, specifying container types at fieldsrequires a duplication of effort, which can lead to errors andinconsistencies. Therefore, the cleanest model of container types alwaysdefines containers as data types, and never anonymously at fields.

Furthermore, if containers must be defined as fields, not data types,then some constructions cannot be modeled. For example, it can bedesirable to define a list of Integers which must contain at least 1 andno more than 10 elements as a data type. Therefore, a core requirementfor a universal data type dictionary is that data types can be containertypes.

The approach disclosed herein does not rely on anonymous data types atthe fields. Instead, whenever a struct data type D has a field f whichis a container, e.g. list<C>, a container data type D′, with payload C,is created and is also inserted into the database. The type of D.f ischanged to D′, and then D is inserted into the database. As anoptimization, the database can be checked for a matching data type whichalready exists.

While alternatives are available, any choice for the representation ofcontainer types involves benefits and drawbacks, and the representationdescribed herein may simplify discussion of the analysis of the contentsof the database. However, if a data type with a field that is acontainer of another type is stored in the database, its in-databaserepresentation shows a structure that is different than its textualrepresentation because the anonymous type has been explicitly named andseparately recorded.

However, by making use of appropriate parsing metadata in the data typeschema, enough information can be stored to recreate the originaltextual structure. For example, when list<C> is stored as data type D′,a parsing metadata indicator can be added that D′ was originally ananonymous type. Then when reconstructing D to its textualrepresentation, list<C> can be substituted for D′ as the data type offield f.

To provide examples for a data_type schema (for, e.g., data types table712) and a field schema (for, e.g., fields table 714), example SQLCREATE TABLE data definition commands are described which can beprovided for the creation of a collection of tables which in combinationcan represent a data type. For brevity, these schemas do not containauditing metadata but those of ordinary skill in the art understandincorporation of such aspects. Parsing metadata is also omitted, as manykinds can be included. When parsing metadata is needed for, e.g.,textual reconstruction of a data type, that metadata can be included.

Italics, as used within the example schema, do not necessarily denotesymbol data types. The schema is understood to be one example of apossible schema, and many equivalent representations are possible:

CREATE TABLE Data_type (  name text NOT NULL,  derived_from textREFERENCES Data_type(name),  description text,  data_type_classvarchar(10) DEFAULT ‘struct’  CHECK(data_type class IN (‘basic’, ‘struct‘, ‘list’, ‘set’, ‘map ‘, ‘bag’)) ,  is_symbol_type boolean NOT NULL, is concrete_type boolean NOT NULL,  is_carrier_type boolean NOT NULL, key_data_type text REFERENCES Data_type(name),  value data type textREFERENCES Data_type(name),  CONSTRAINT pk_data_type PRIMARY KEY(name), CHECK( NOT (data_type_class<> ‘basic’ And is_symbol_type=True)), CHECK( NOT (data_type_class<>‘basic' And is_concrete_type=True)), CHECK( NOT (data_type_class<> ‘basic' And is_carrier_type=True)), CHECK( NOT (is_carrier_type=true and derived_from IS NOT NULL)), CHECK( key_data_type IS NULL OR (key_data_type IS NOT NULL and key_data_type <>name),  CHECK(value_data_type IS NULL OR (value_data_type ISNOT NULL and key_data_type <>name),  ) ; CREATE TABLE Fields (  nametext NOT NULL,  field_of text NOT NULL,  description text,  type_nametext NOT NULL,  default_value text,  required boolean NOT NULL DEFAULTfalse,  FOREIGN KEY (field_of) REFERENCES Data_type(name) ON DELETECASCADE,  FOREIGN KEY (type_name) REFERENCES Data_type(name), CONSTRAINT pk_fields PRIMARY KEY(name , field_of) );Each data type has a name, which is the primary key. derived_fromindicates the parent data type, if any, while description contains anycomments. The next five fields classify the data type. data_type_classindicates whether the type is a container type and if so, what type(list, set, map, bag), or not a container type (struct or basic). Thiscollection of container types is intended to be illustrative, but not anexhaustive listing of container types. Alternatively, ifdata_type_class=‘basic’, then the value of the fields is_symbol_type,is_concrete_type, and is_carrier_type, indicate whether the data type issymbol, concrete, or carrier. These values can only be true ifdata_type_class=‘basic’. It is understood that checks on the properlabeling of a data type can be performed in multiple ways.

If the data type is a container type (data_type_class is list, set, orbag), then key data_type and value_data_type indicate the data typesbeing contained. If the data_type_class is list, set, or bag, thenkey_data_type is the data type being contained in the data type underconsideration, and value_data_type is Null. If data_type_class is map,then sub_data_type is the data type of the map's key, andvalue_data_type is the data type of the map's value. If data_type_classis struct or basic, then both sub_data_type and value_data_type areNull.

A container cannot contain itself (without causing infinite recursion),so if they are not null, neither keydata_type nor the value_data_typecan be equal to name.

If a Data_type record has a data_type_class value of ‘struct’, then thedata type can have a list of fields. Following conventional relationalcompositions, the table fields contains the field field_of, which is aforeign key that joins with data_type.name.

A field is described by its name, and field of is the name of the datatype of the data type of which it is a part. (name, field_of) are theprimary key of Fields. description contains user metadata about thefield. The field must be of a particular type, which as its name intype_name. The default value of the field, if any, is stored indefault_value. If the field must have a value (e.g., not Null), thenrequired has value True, else False. Although required is a constraint,and there are separate constraint tables, a practice used with modelinglanguages may have a separate ‘required’ property for fields. However itis understood that the ‘required’ property can be moved to thecollection of other constraints.

There are some constraints on the database involving derived_from thatcannot be checked using localized constraints such as CHECK. Let d be adata_types record. If (again using italics to denote symbol types)d.derived_from is not NULL, then:

The value of data_type_class must be the same in the record and in theparent (named by derived_from);

if data_type_class is a container type (list, set, map, bag), then thevalue of value_data_type must either be equal to that of the parent, ormust be a derived_from descendent of the value_data_type of the parent;

if data_type_class is a map, then the value of key_data_type must eitherbe equal to that of the parent, or must be a derived_from descendent ofthe key_data_type of the parent;

if a record from the Fields table which joins to the data type has thesame name as a field that is inherited through the derived_from chain,then: if field overriding is not permitted, the let f be the field name,df its data type, da be the nearest derived_from ancestor, and daf bethe data type off in da; then df must be a derived_from descendent ofdaf; and otherwise there is no constraint on the data type of f; and

The collection of name values in the records from Fields which join tothe Data_type record should not intersect the set of name values in therecords from Field type which join to any derived_from ancestor.

As discussed, data model database 700 includes various constrainttables. Data types may have a collection of constraints. In most datamodeling languages, fields can also have constraints. However,representing constraints in multiple ways makes data representation andanalysis more complex, as constraints defined at the field level createanonymous types. So, just as anonymous container types defined as fieldtypes are named and stored in the data_types table (see Section 4.1.1),anonymous types created by constraints on field values are also namedand stored in the Data types table. If field f has type_name D withconstraint set C, data type D′ is created, derived_from D withadditional constraints C, and the type_name off is set to D′. Thisprocedure is illustrated in the example in Section 4.1.4.

Classes of constraints here include value constraints (as can be storedin, e.g., value constraints table 722). Value constraints describe thepossible values of basic data types. These constraints may sometimes becalled facets, following the terminology of XSD. Value constraints aregenerally drawn from the following collection, although additionalconstraints are possible:

Enumeration: declares that the values of the data type must be drawnfrom a supplied list of possibilities.

Pattern: declares that string data must match a specified regularexpression

Min_inclusive: declares that ordered data must be larger than or equalto the specified value.

Min_exclusive: declares that ordered data must be larger than thespecified value.

Max_inclusive: declares that ordered data must be smaller than or equalto the specified value.

Max_exclusive: declares that ordered data must be smaller than thespecified value.

Min_length: declares data that has a size (e.g., strings) must have asize greater than or equal to the specified integer length.

Max_length: declares data that has a size (e.g., strings) must have asize less than or equal to the specified integer length.

Container constraints (stored in, e.g., container constraints table 724)describe possible values of containers as a whole. Additionalconstraints are possible, e.g. uniqueness of field combinations and soon, but container constraints can include:

Min_length: declares that the container has at least the number ofentries specified by an integer length.

Max_length: declares that the container has at most the number ofentries specified by an integer length.

Constraints can also include field pattern constraints (as stored in,e.g., field pattern constraints table 726). Tree pattern schemalanguages such as Yang, JSON Schema, and XSD allow for variations in thedata that can be present in a node. When translated into data types,these languages allow for a pattern of fields required to be present orabsent. Some examples of field pattern constraints are:

Exactly_one_of: exactly one of the specified sets of fields is present

At_most_one_of: at most one of the specified sets of fields is present.

Value constraints can apply to basic data types and to fields of a basicdata type. Container constraints can apply to container data types andfields of container data types. Fields pattern constraints can applyonly to data types which are not basic and are not container types.

CREATE TABLE Data_type_constraint (  type_name text NOT NULL REFERENCESData type(name) ON DELETE CASCADE ,  id integer NOT NULL,  descriptiontext,  predicate type varchar(15) NOT NULL,  CHECK(predicate_type IN(‘enumeration’ , ‘pattern’, ‘max inclusive’ , ‘max_exclusive’ ,‘min_inclusive’, ‘max_exclusive’, ,“max_length’, ‘min_length’,‘exactly_one_of’, ‘at_most_one_of’)), literal text,  CONSTRAINTpk_type_constraint PRI MARY KEY(type_name, id) ) ;

A common practice is to define a data type by constraining a basic datatype. For example, a Boolean_string data type, representing the symboltype Boolean, can be a string restricted to the values (‘True’,‘False’). To make this set property searchable, additional tables arecreated to represent the enumeration values.

CREATE TABLE Enumerated_literal_data_type (  type_name text NOT NULL, constraint_id integer NOT NULL,  literal text NOT NULL,  descriptiontext,  CONSTRAINT pk_enumerated_literal_dt PRIMARY KEY(type_name,constraint_id, literal),  FOREIGN KEY (type_name, constraint_id)REFERENCES Data_type_constraint(type_name, id) ON DELETE CASCADE ) ;

To illustrate how a data type can be represented in the database,consider the following Tosca example:

data_types:  d1:  derived_from: d2  description: an illustrative example properties: alpha:  type: integer  constraint:  - valid_values  -4  - 5 - 45 beta:  type: list  entry_schema: string gamma:  type: floatData type d1 has two fields (alpha and beta) that are of anonymoustypes. There are many ways to manage the metadata associated with thereconstruction of textual representations that have anonymous types, butfor the purposes of explanation it can be assumed that the data_typestable has the following field in addition to those described in theexample schema above:

is_anonymous_type boolean NOT NULL,

Data type d1 is derived_from d2, is a struct type, and has adescription. Therefore the SQL command for recording data type d1 is:

 INSERT INTO Data_types(name, derived_from, description,data_type_class, is_symbol_type, is_concrete_type, is_carrier_type,is_anonymous_type, key_data_type, value_data_type)  VALUES(‘d1’, ‘d2’,‘an illustrative example’, ‘struct’, false, false, false, false, null,null );

The fields of d1 can be inserted into the fields table after theanonymous types are created. Field alpha is an integer with constraintson the values it can take. It can be created as follows:

 INSERT INTO Data_types(name, derived_from, description,data_type_class, is_symbol_type, is_concrete_type, is_carrier_type,is_anonymous_type, key_data type, value_data_type) VALUES(‘anon_d1_alpha’, integer, null, ‘basic’, true, false, false,true, null, null);  INSERT INTO Data_type_constraint (type_name, id,description, predicate_type, literal)  VALUES(‘anon_d1_alpha’, 0, null,‘enumeration’, null)  INSERT INTOEnumerated_literal_data_type(type_name, constraint_id, literal,description) VALUES (‘anon_d1_alpha’, 0, ‘4’, null),  (‘anon_d1_alpha’,0, ‘5’, null),  (‘anon_d1_alpha’, 0, ‘45’, null);

For this data type ingest, the Tosca carrier integer type is mapped tothe symbol integer type. So the anonymous type is also a symbol type,but which has additional constraints. Field beta is a container type,and is added using:

INSERT INTO Data_types(name, derived_from, description, data_type_class,is_symbol_type, is_concrete_type, is_carrier_type, is_anonymous_type,key_data_type, value_data_type)

VALUES(‘anon_d1_beta’, null, null, ‘list’, false, false, false, false,string, null);

The fields of d1 are inserted using:

INSERT INTO Fields(name, field_of, description, type name,default_value, required)

VALUES (‘alpha’, ‘d1’, null, ‘anon_d1_alpha’, null, null),

-   -   (‘beta’, ‘d1’, null, ‘anon_d1_beta’, null, null),    -   (‘gamma’, ‘d1’, null, ‘float’, null, null);        Reconstructing the data type into a particular representation        language depends on the specifics of the language. To render d1        back into Tosca, the d1 record is fetched from Data_types; the        derived_from and description fields are created; and then the        code path corresponding to a data_type_class of ‘struct’ is        executed. This involves fetching all Fields record where and        filling out the name, description, required, default, and type        fields. Since types of alpha and beta are anonymous (have        is_anonymous=true), the description of the corresponding        anonymous types replaces the reference to the anonymous types.        The three fields are then made children of the “properties” of        d1.

Next, consider an example where a constraint is added to a field of theparent class:

data_types:  d0:   derived_from: d1   description: example ofconstrained fields   constraints:    gamma:     Max_exclusive: 100.0Value constraints do not apply struct data types, so a new anonymousdata type must be created:

INSERT INTO Data_types(name, derived_from, description, data_type_class,is_symbol_type, is_concrete_type, is_carrier_type, is_anonymous_type,key_data_type, value_data_type)

VALUES(‘anon_d0_gamma’, float, null, ‘basic’, true, false, false, true,null, null);

INSERT INTO Data_type_constraint(type_name, id, description,predicate_type, literal)

VALUES(‘anon_d0_gamma’, 0, null, ‘Max exclusive’, ‘100.0’)

Now d0 and its field can be inserted:

INSERT INTO Data_types(name, derived_from, description, data_type_class,is_symbol_type, is_concrete_type, is_carrier_type, is_anonymous_type,key_data_type, value_data_type)

VALUES(‘d0’, ‘d1’, ‘example of constrained fields’, ‘struct’, false,false, false, false, null, null);

INSERT INTO Fields(name, field_of, description, type_name,default_value, required)

VALUES (‘gamma, ‘d0’, null, anon_d0_gamma, null, null);

The data type of d0.gamma is a derived_from descendent of the data typeof d1.gamma, so the rules of overriding parent fields are not violated.

In addition to the tables which define data types, fields, and theirconstraints, additional tables can store relationships between datatypes (e.g., relationship tables 750). Examples include carrier_type_of,rep_by, and concrete_type_of. A basic representation of these tables(excluding metadata such as timestamps, identity of the record author,and so on) is just the source and target data type names. For example:

CREATE TABLE concrete_type_of (  symbol_type text NOT NULL, replacement_type text NOT NULL,  description text,  CONSTRAINTpk_concrete_type_of PRIMARY KEY(symbol_type, constraint_id,replacement_type),  FOREIGN KEY (symbol_type) REFERENCESData_types(name) ON DELETE CASCADE,  FOREIGN KEY (replacement_type)REFERENCES Data_types (name) ON DELETE CASCADE ) ;

The tables in data model database 700 which describe data types have agreat deal of referential integrity, as indicated by the FOREIGN KEY SQLDDL constraints. Every enumerated_literal must be associated with aconstraint, every constraint must be associated with a data type, andevery field must be associated with a data type.

In addition, data types have references to each other. These referencescan be direct (derived_from, key_data_type, value_data_type) or indirect(type_name in a Fields record that joins to the data type record). Thesereferences ensure data integrity.

In the following portion of the disclosure, discussing deletion safetyand other aspects the following notations are used:

-   -   Data type: lowercase italic, e.g. d.    -   Collection of data types: uppercase bracket, e.g. [D].    -   Uppercase italic: dependency tree, e.g. D.    -   Data type name: lowercase plain, e.g. d.    -   Collection of data type names: uppercase plain, e.g. D.

The complex dependency chains can make the process of deleting a datatype from the database complicated. A collection of data types [D] isdelete safe if every data type that references a data type in [D] isalso in [D]. Deletion-safety can be determined by finding all data typesthat refer to a data type in [D] (via derived_from, key_data_type,value_data_type, or via a field) and verifying that this set is a subsetof [D].

Given data type d, its safe-deletion set, safe-deletion(d) is theminimal set of data types [D] that contains d, and which is deletionsafe. Potentially, this is the entire database. This set can be found bystarting at d and following dependency links until no new data types arereached. While those of ordinary skill in the art will appreciate morethan one way to code an appropriate algorithm, one example provides:

Given d, Set [D] = emptyset; Set [Dnew] = {d}; While [Dnew] is notempty:  Find all d’ such that d’.derived_from is in [Dnew] and d’ not in[D], putting these d’ into [Ddrv];  Find all d’ such that d’ is a structand d’ has at least one field f whose type is in [Dnew], and d’ is notin [D] putting these d’ in [Dfld];  Find all d’ such that d’ is a list,set, bag, or map, and d’.key_data_type is in [Dnew] and d’ is not in[D], putting these d’ in [Dkey];  Find all d’ such that d’ is a map, andd’.val_data_type is in [Dnew] and d’ is not in [D]. Put these d’ in[Dval].  Set [D] = [D] union [Dnew]; and  Set [Dnew] = [Ddrv] union[Dfld] union [Dkey] union [Dval].This algorithm can be implemented in many ways and in many languages andmaking greater or lesser use of SQL.

For an example, consider computing safe-deletion(d1) in an example treein which d0 is derived_from d1; d1 is derived_from d2, a field type ofd3, and a key type of d5; d2 is derived_from d6; and d3 is derived_fromd4. In the first iteration, d3, d2, and d1 are put into [Dnew] becaused3 has a field of type d1, d2 derived_from d1, and d5 is a list of typed1. In the next round, d4 and d6 are put into [Dnew] because theydepend_on data types in [Dnew]. At this point the iteration stops. Typed0 is not in safe_deletion(d1) because it does not depend on d1, eitherdirectly or transitively.

In normal usage, data types are deprecated rather than deleted, as isdiscussed herein.

In many cases, all chains of data type dependencies for a DirectedAcyclic Graph. For many of the dependencies, basic constraints ensurethe acyclicity of the dependence chains. All derived_from dependencychains must be acyclic. This property can be ensured if data types areentered one at a time as the referenced data type must already exist.Dependencies via container types and derived_from dependency chains mustbe acyclic. A container type cannot contain itself, e.g., type A cannotbe defined as list<A>, and this constraint is checked locally. A subtypeof an existing data type must have the same essential properties as theparent, so if a subtype is a container, its parent must also be acontainer. A subtype of a container type can contain a subtype of thetype contained by the parent, but subtyped container types cannot inducecyclic dependencies as (working backwards) there must be some parentthat contains itself.

Many schema languages, e.g. XSD and Json Schema, have explicitcapabilities for cyclic dependencies. These dependencies come fromnon-basic struct data types (data types that have fields). Data typesthat contain cyclic dependencies create complications for many utilitiesand processes, including deletion and hierarchical display of nesteddata type structure. Furthermore, it is possible to define data typeswhose constraints require that any valid instantiation have infinitenesting, and this possibility must be excluded.

To check for cyclic dependencies, a dependency tree can be built, whichshows data type dependencies through composition. To help in the uniformconstruction of the dependency tree, container data types can be treatedas having fields. List, set, bag, and map containers all have a fieldwith a special name, e.g. $key$, whose type is indicated by thekey_data_type field of the data_type record. In addition, map data typeshave a field with a special name, e.g. $value$, whose type is indicatedby the value_data_type field of the data_type record.

For the initial description of the dependency tree, it can be assumedthat there is no derived_from relationships in any of the types in thedependency tree. In this case, the dependency tree of data type d,dep_tree(d) or D, is built recursively according to the following rules:

A node in the dependency tree is constructed from a data type by listingthe following information:

-   -   The name of the data type;    -   Whether or not the data type is a container type;    -   The value of the minlength constraint;        -   Which is set to 1 for non-container (struct, basic) types;            and        -   Which otherwise has the default value 0 if no minlength            constraint is present.    -   For each field:        -   The name of the field;        -   The name of the field data type;        -   Whether the field is required (in the fields record); and        -   $key$ and $value$ are always required.    -   A pointer (perhaps null) to another dependency tree node.

The root n of the dependency tree is constructed from d.

A field entry of a data type node n′, with data type d′ is expanded byconstructing a dependency tree node from d′, and pointing to the newlyconstructed dependency tree node, as long as the following twoconditions hold:

-   -   The data_type_class of d′ is not basic; and    -   d′ is not d, and does not appear in the path from n to n′.

The construction of the dependency tree is completed when no field entryin any node of the tree can be expanded.

The dependency tree of d can be computed by various recursive treeconstruction techniques.

For an example of a dependency tree, the example data type d1 can beused. A dependency tree of d1 can include the following details:

Type name d1 is not a container and has a minimum length of 1, and isdefined by fields alpha (anon_d1_alpha) which is required, beta(anon_d1_beta) which is required, and gamma (float) which is required;

Type anon_d1_beta depends from d1 and is a container with a minimumlength of 0, and is defined by $key$ (string) which is required.

Data type d1 is a struct, so it is not a container and has min lengthof 1. Type d1 has three fields, alpha, beta, and gamma. Fields alpha andbeta have anonymous types, anon_d1_alpha and anon_d2_beta, respectively.Types anon_d1_alpha and float are basic and therefore are not expanded.Type anon_d1_beta is composite, so a node corresponding to anon_d1_betais created and linked to field beta. Type anon_d1_beta is a container,with min_length 0.

A list container has a field, $key$ and since field beta is a list ofstrings, the data type of $key$ is string, which is basic so no moreexpansions are performed.

A completed dependency tree will have leaves that are either basictypes, or are non-basic types but have a repeat occurrence on theroot-to-leaf path. Therefore a data type is recursive if at least one ofits leaves is not basic, and (else) the data type is non-recursive.

A data type d is recursive because it is or it depends on some data typed′ that transitively references itself. These data types d′ are the coreof the recursive data types, so it can be useful to explicitly namethem. A data type d is root-recursive if one of the leaves has data typed. Recursive data types are common in tree-structured languages such asXSD or JSON Schema. Consider an example of a recursive data type using askeletonized representation of the data types that lists only fields andtheir types for brevity:

d0:

alpha: d1

beta: string

d1:

gamma: d2

delta: integer

d2:

epsilon: d3

zeta: float

d3:

$key$: d1

Data type d1 occurs twice in the path from d0 through d3, inclusive.Therefore d0 is recursive, and d1 is root-recursive.

An alternative way to represent and analyze dependencies is to build adependency graph instead of a dependency tree. The difference is thatevery data type in the database is a node and every field of a non-basictype is linked to its dependent data type. The root-recursive data typesare the nodes in a cycle, and the recursive data types are the nodesthat are connected, directly or transitively, to a root-recursive node.Various algorithms can be used for finding all cycles in a graph, andall nodes that lead to a cycle. While dependency trees are discussedmore frequently in this disclosure, it is understood that a dependencygraph can be used to speed up the detection of possible problematic datatypes.

If the data types in the dependency tree have derived_fromrelationships, then the conditions for expansion and the definitions ofrecursive and root-recursive data types remain the same. The reasons canbe divided into two cases. First: There is a path from d′ to d″ in thedependency tree and d″ is transitively derived_from d′. Then since d″contains all fields of d′, including the field that leads from d′ to d″,there is a cycle in the dependencies in the data types. But in thiscycle will be found in a path from d″ to d″ in the dependency tree. Typed′ will be found to be recursive (and perhaps root-recursive via adifferent path) and d″ will be found to be root-recursive. Second: Thereis a path from d′ to d″ in the dependency tree and d′ is transitivelyderived_from d″. There is a cycle in the dependencies if the field of d′exists in d″. But in this case there will be a continuation of the pathto another d″ and the cycle will be found, and d″ will be found to beroot-recursive.

If a collection of data types is recursive, then by definition thereexists at least one cycle among the dependency trees, and thereforethere is at least one root-recursive data type. It is possible for sucha collection of data types to have constraints such that any instancesatisfying the constraints of the data type(s) in the cycle must beinfinitely large.

A node in the dependency tree is optional if:

Its parent field entry (if any) has required=False.It is a container type with a constraint minlength=0.

A root-recursive data type d is root-recursion valid if every path inthe dependency tree of d to a leaf with typed has at least one optionaldata type; otherwise it is root-recursion invalid. A collection of datatypes [D] is root-recursion valid if every root-recursive data type isroot-recursion valid. A data type d′ is recursion invalid if there is aroot-recursion invalid data type in its dependency tree, else it isrecursion valid.

A collection of data types that is recursion-valid has finite-sizedinstances that satisfy the constraints, if there are any instances thatsatisfy the constraints (the constraints might be unsatisfiable).Various algorithms can verify whether or not a root-recursive data typeis recursion-valid is a tree search algorithm.

For an example of a data type which is not root recursive, runningexample can be shown augmented with required values in the fields andmin length values for the container types. D1 is root-recursive,providing a starting point. The path that leads to d1 again goes throughfield d1.gamma, which is required, then through d2.epsilon, which isalso required. Type d3 is a container type of d1, and its min length is2. Therefore any data instance satisfying these constraints must beinfinitely large. Therefore d1 is root-recursion invalid (and so are d2and d3). Type d1 is in the dependency tree of d0, so d0 isrecursion-invalid.

Data model database 700 supports extensive versioning. Theinterconnected nature of data types makes changes in-place dangerous, asa change to data type d can cause a change to seemingly unrelated datatype e that depends on d because, e.g., either e is transitivelyderived-from d, or dis in the dependency tree of e.

Referential integrity requires that if a data type d is deleted, alldata types e that depend on d must also be deleted. This can be a verylarge set of data types that would need to be deleted, and many of themmight be in active use.

In normal usage, data types are “write-once,” e.g., never modified ordeleted. Rather than being modified or deleted, data types areversioned, and old versions can be deprecated.

The version of a data type is based on an ordered numbering system. Oneoption for a version number is to use an integer, starting with 1.However, many software developers use semantic versioning, whichconsists of three integers usually written as major.minor.patch, e.g.,version 1.1.2. Ordering is performed lexicographically, starting withmajor, breaking ties with minor, and breaking ties again with patch.Semantic version numbers often start as 0.1.0.

With versioning, the unique identifier of a data type is no longer itsname, but rather a combination of (name, version). Furthermore, allreferences to a data type either from the data_type table or from thesubsidiary fields or data_type_constraint tables must be the pair (name,version). For database efficiency, and to simplify queries, a commonbest practice is to use an integer field (say, id) as the primary key ofthe table (e.g., using the serial data type in Postgres), and declaringa UNIQUE constraint on the pair (name, version). Then all REFERENCESconstraints are to the id field in the data_type table.

Given a data type name d, the collection of all data_type records in thedatabase with name=d are the versions of d. The version of d with thelargest version is the current version of d, and the other versions of dare previous versions of d.

The definition of the dependency tree must be modified slightly toaccount for versions. Each data type name is now a pair (name, version).The expansion rule now requires a match on both name and version to stopthe expansion of a data type node. While a data type named might appearmultiple times in a root-to-leaf path, if there are cyclic dependenciesthen eventually there will be a cycle with a particular version of d.

In the presence of versioned data types, version management becomes animportant issue. Because of the interconnected nature of data types, aversion change in one data type can affect the versioning properties ofmany other data types. Let d be a data type, let D be its dependencytree, and let D be the set of data type names in D.

A data type d is up-to-date if d is current, and for each d′ in D, d′ iscurrent. Otherwise, d is obsolete. A data type d can be determined to beup-to-date or obsolete by constructing its dependency tree and for everynode, checking if the data type is up-to-date.

A data type d with name d and version v can be marked as deprecated,meaning that it should not be in active use. If d is deprecated, thenevery data type d′ with name d and version v′<=v is also deprecated. Thecollection of data types that are deprecated can be summarized with atable that maps data type named to the largest version v that isdeprecated, if any.

A data type d is transitively deprecated if d is deprecated, or if thereis a d′ in D such that d′ is deprecated. A data type dean be determinedto be transitively deprecated by building its dependency tree andchecking each node to determine if the data type represented by the nodeis deprecated.

A data type d is version-consistent if for each data type name d′ in D,every node d′ in D that represents a data type with name d′ has the sameversion. Otherwise, d is version-inconsistent.

The algorithms for computing if a data type d is up-to-date,transitively deprecated, or version consistent consist of building thedependency tree D of d and performing a search on D for the indicatedproperties. Various algorithms can be employed to search the dependencytree.

For an example of these concepts, consider the simplified dependencytree with eight nodes; (d1, version 1); (d2, version 2), (d3, version5), and (d4, version 2), which depend from (d1, version 1); (d5, version3) and (d6, version 2), which depend from (d2, version 2); and (d3,version 1) and (d5, version 3), which depend from (d4, version 2). Forthe purpose of the discussion, all nodes in the tree are up-to-dateexcept for (d3, version 1). Then, (d1, version 1) is obsolete because(d3, version 1) is not current. Type (d1, version 1) is transitivelydeprecated because (d3, version 1) is deprecated. Finally, (d1,version 1) is version-inconsistent because it depends on two differentversions of d3.

A data type d that is obsolete can, in principal, be made up-to-date byreplacing each reference to an obsolete data type with the current datatype. However, the nested and possible cyclic nature of data typecross-references adds some complications to this process.

An algorithm for computing an up-to-date version of a data type d, whered is current, is as follows. Let D be the dependency tree of d:

1. Do  a. Build the dependence tree D of d;  b. Make a search of Dstarting at d, processing parents before children;  c. At each node d’with parent p in the search   i. if d’ is not current,    1. let d” bethe current version of d’    2. for each node n on the path from d’ tod,     a. create a new current node n’ if one has not already been    created     b. replace the reference to n with n’ in p’, the parentof n.    3. replace the reference to d’ with a reference to d” in p. 2.until D does not change.Such an algorithm may replace references to obsolete data types withcurrent data types. However doing so requires a change to the parent.Replacing the reference to d′ to d″ in the parent p changes p. If p isnot a newly created version, a new version of p must be created and madethe current version. This rule applies recursively up to the root d.

When the algorithm replaces a previous node with a current node, it doesnot recompute the dependency subtree rooted at the replacement node. Ifthe replacement node is current but not up-to-date, then an up-to-dateversion of the replacement node must be created. This is handled byrepeating the transformation of D until there is no further change. Analternative algorithm will build the dependence subtree of thereplacement node and continuing the search for previous nodes in thesubtree. Alternative algorithms can be used in various embodiments.

Consider an example of computing an up-to-date version of d1 in whichthe tree contains (d1, version 2); (d2, version 2), (d3, version 5), and(d4, version 3) which depend from (d1, version 2); (d5, version 3) and(d6, version 2) which depend from (d2, version 2); and (d3, version 1)and (d5, version 3) which depend from (d4, version 4). While performingthe search of the dependence tree of (d1, version 1), node (d3,version 1) is found to be out-of-date. Therefore, all ancestors of (d3,version 1) must be replaced with a new version, as described incomparison to the earlier tree version described above. Finally, in anadditional update from that described above, (d3, version 1) is replacedby (d3, version 5), making (d1, version 2) up-to-date, as well asversion consistent and no longer transitively deprecated.

If (d3, version 5) were not basic, another pass of building thedependence tree and replacing previous data types would be required.

This creation of an up-to-date version of d can involve unintendedchanges to d, and to other parts of the database. Therefore the resultshould be subject to a review and not automatically installed into thedatabase.

Data model database 700 also provides namespaces for different groups.In a large organization, there can be many groups developing data types,and many levels of “official” data types. Different groups mayindependently use the same name for different data types. Therefore itis convenient to have different groups work in different namespaces.

To accommodate namespaces, the data type name can be replaced with thepair (namespace, type_name). When combined with versioning, the primarykey of a data type is (namespace, type_name, version). Therefore,foreign key references are triples (namespace, type_name, version). Fordatabase efficiency, and to simplify queries, a common best practice isto use an integer field (say, id) as the primary key of the table (e.g.,using the serial data type in Postgres), and declaring a UNIQUEconstraint on the pair (name, version). Then all REFERENCES constraintsare to the id field in the data_type table.

Issues may arise when transferring data types between namespaces. Forexample, a data type in one namespace can reference a data type inanother. Other issues can arise when making a bulk transfer fromnamespace NS1 to namespace NS2, which involves collection of thedependency tree, collection of distinct types (including multipleversions of the same type), forming a replacement list of types in NS2to replace types from NS1, moving all unreplaced types from NS1 to NS2,making appropriate changes in type references and naming, and createlinkages for moved types in a “created from” table.

Other kinds of transformations make use of the dependency tree. Forexample, a symbol data type can be transformed to a pure concrete datatype. The concrete_type_of relationship provides substitutions of (more)concrete data types for symbol basic data types. Various algorithms canbe used for computing the set of complete acyclic expansions of a symbolconcrete data type. For basic symbol data type d, each child of dinacyclic_expansion(d) is a valid substitution for d, and each child thatis not marked incomplete produces a pure concrete substitution of d.

For a given symbol data type, replacements can be found. Let dep_tree(d)be the dependency tree of d. The algorithm for transforming d fromsymbol to pure concrete is:

 1. Make dc a copy of d but with a changed name.  2. Computedep_tree(dc).  3. Let Ddep be the set of basic symbol types in thedependency tree of dc (found at the leaves of dep_tree(dc)).  4. WhileDdep is not empty:   a. Find Rdep, which maps elements of Ddep to areplacement dependency subtree  that is pure concrete (alternatively,incomplete substitutions can be made). Rdep is found  by:    i. Forevery d’ in Ddep,     1. Choose a child d of d’ in acyclic_expansion(d’)that is not    marked incomplete and put (d’, d) in Rdep.     2.Alternatively, d can be marked incomplete.   b. compute an updatedversion of dc as follows:    i. For every leaf I in dep_tree(dc) suchthat I is basic symbol,     1. For each node n on the root-to-leaf pathfrom dc to I,      a. If n was not changed from dep_tree(dc),       i.Create a new node n' which is equal to n but with      a new name.      ii. Replace the reference to n with a reference to n'      in p’,the parent of n.     2. Replace I with Rdep(I) in p, the parent of I.  c. Recompute Ddep = dep_tree(dc).The process of the replacement of basic symbol data types with concretereplacements is similar to the process of the replacement of obsoletedata types with up-to-date data types as described herein. Thisalgorithm is one example that replaces basic symbol data types at theleaves of a dependency tree, but others can be used without departingfrom the scope or spirit of the innovation. The replacement data typesmight directly or transitively depend on other symbol basic data types.Therefore iteration is required until all symbol basic data types havebeen replaced. An alternative algorithm transforms the replacementdependency trees before performing the replacement. A recursivealgorithm might be more efficient in certain embodiments.

Aspects herein also support transformation of a pure concrete carriertype to a pure carrier data type. For a data type to be deployed into atarget schema system, it must be transformed into a pure carrier type.The above algorithm(s) provide for transforming a symbol type into apure concrete type. Therefore, the input to the transformation algorithmcan be a pure concrete type.

The algorithm substitutes carrier types for concrete types using thecarrier_type_of relation. Aspects herein describe the carrier_type_ofrelation and how it can be extended using inference rules and the rep_byrelation. Let CTOx be the carrier_type_of relation which is fullyextended by the rep_by and carrier_type_of inference rules describedherein. The algorithms for performing this inference are well known toexperts in the field. If T is the target schema system, let CTOx(T) beCTOx restricted to target T.

Let dep_tree(d) be the dependency tree of d. An algorithm fortransforming d from pure concrete to pure carrier can be described asfollows:

 1. Make dc a copy of d but with a changed name;  2. Computedep_tree(dc);  3. Let Dconc be the set of basic concrete types in thedependency tree of dc (found at the leaves of dep_tree(dc));  4. Foreach dc in Dconc, choose a carrier type e such that (dc, c) is inCTOx(T), denoting this choice by Carrier(dc);   a. If there is a dc ’ inDconc such that there is no matching record in CTOx(T),  then thetransformation fails;  5. Compute an updated version of dc as follows:  a. For every leaf I in dep_tree(dc) such that I is basic concrete,   i. For each node n on the root-to-leaf path from dc to I,     1. If nwas not changed from dep_tree(dc),      a. Create a new node n' which isequal to n but with a new     name.      b. Replace the reference to nwith a reference to n’ in p’     (the parent of n).    ii. Replace Iwith Carrier(I) inp (the parent of I).The algorithm to change a pure concrete type into a pure carrier_type isnot recursive because basic types get replaced by other basic types.

The algorithm can fail to find a pure carrier_type if there is aconcrete type with no carrier_type_of_mapping, even with transitiveclosure of the inference rules. For example a symbol type ip_addressmight be made concrete by replacing it with type Cidr (as describedherein). To translate this type into JSON, a JSON carrier_type is neededfor Cidr. However, if there is no native Cidr type in JSON, then aconversion to JSON carrier types will fail.

There are several directions of recourse if the conversion from a pureconcrete to a pure carrier_type fails. The target schema can be searchedfor an appropriate mapping if the e concrete_type_of relationship isincomplete. The rep by rules can be augmented to enable an appropriatemapping to be discovered in CTOx(JSON). If the pure concrete type wascreated from n symbol type that contained ip_address, the symbol to pureconcrete type conversion can be revisited and use onlysymbol-to-concrete mappings such that the concrete types have a mappingto a carrier type in T.

The algorithm symbol to pure concrete conversion algorithm can beaugmented to find pure concrete types that have mappings to T asfollows:

When computing acyclic_expansion(d), find the invalid substitutions by:

marking leaves which represent concrete types as invalid if the concretetype has no mapping in CTOx(T).

marking the interior nodes as invalid using the same algorithm used tomark nodes as incomplete as described above.

In step 4.a.i.1 of the algorithm for transforming a symbol type to apure concrete type, it may also be required that the replacement d isnot marked invalid.

FIG. 8 provides further details on a data model database, illustratingan example entity relationship diagram 800 for a data model databasedisclosed herein. Entity relationship diagram 800 uses distinct lines toindicate one-to-one, one-to-many, and many-to-many relationships withrespect to data types, constraints, fields, relationships, et cetera.

With the above contents and properties of data model database 700understood, various methodologies can be implemented using data modeldatabase 700 or similar databases disclosed herein. FIG. 9 illustratesan example methodology 900 for converting data between types using adata model database disclosed herein. Methodology 900 begins at 902 andproceeds to 904 where aspects are performed including identifying acarrier type based on language-specific carrier data from a firstsystem. Thereafter, at 906, aspects are performed including identifying,in a data model database, a language-agnostic concrete type associatedwith the language-specific carrier type. At 908, aspects are performedincluding identifying a symbol data type associated with thelanguage-agnostic concrete type. At 910, aspects are performed includingconverting the language-specific carrier data to a secondlanguage-specific carrier type based on the symbol data type.Thereafter, at 912, methodology 900 ends.

In various embodiments, methodology 900 can further include identifyinga second language-agnostic concrete type based on the symbol data typeand identifying the second language-specific carrier type based on thesecond language-agnostic concrete type.

In various embodiments of methodology 900 converting thelanguage-specific carrier data is performed using a mapping tabledefining a plurality of relationships (e.g., with a rep byrelationship). In embodiments, the mapping table includes a firstrelationship defining that the first language-specific carrier type andthe second language-specific carrier type share the language-agnosticconcrete type, and wherein the mapping table includes a secondrelationship defining that every value of the first language-specificcarrier type exist among values of the second language-specific carriertype.

In various embodiments of methodology 900, converting thelanguage-specific carrier data is performed using a constraint table. Inalternative or complementary embodiments, the language-specific carrierdata has an enumerated value.

FIG. 10 illustrates an example methodology 1000 for determining whetherdata is delete-safe using a data model database disclosed herein.Methodology 1000 begins at 1002 and proceeds to 1004 where aspectsperformed include receiving a request to delete a data type collection,wherein the data type collection is reflected in a data model database,and wherein the data type collection includes one or more data types,the one or more data types including at least one of a symbol data type,a language-agnostic concrete type, or a language-specific carrier datatype. At 1006, aspects performed include determining a dependency chainfor the data type collection using the data model database.

At 1008, a determination is made as to whether the data type collectionis delete safe. Aspects performed based on whether the determination at1008 returns positive or negative include indicating the data typecollection as delete safe (1010) or not delete safe (1012), wherein thedata type collection is marked as delete safe (1010) if the dependencychain indicates that remaining data types in the data model database donot include references to the one or more data types of the data typecollection, and wherein the data type collection is marked not deletesafe (1012) if the dependency chain indicates that one or more of theremaining data types in the data model database include references tothe one or more data types of the data type collection.

In various embodiments of methodology 1000, the data type collection isa composite data field. In some embodiments, the composite data field isone of a structure or a container.

In various embodiments of methodology 1000, the dependency chain isdetermined using a mapping table including a plurality of relationships.In embodiments the plurality of relationships can include a symbolrelationship property shared by two or more symbol data types, theplurality of relationships includes a concrete relationship propertyshared between two or more concrete data types, and a carrierrelationship property is shared by two or more carrier data types. Inembodiments, the plurality of relationships indicate one or more of theremaining data types are derived from the one or more data types, or inalternative embodiments the plurality of relationships indicate noremaining data types are derived from the one or more data types.

Various other methodologies are enabled using the data model databasedisclosed herein. A methodology disclosed herein performs automaticallyinferring the carrier types that can represent a concrete type. Anothermethodology disclosed herein performs expanding a symbol type to aconcrete type. Another methodology disclosed herein performs inferringconcrete_type_of relationships. Another methodology disclosed hereinperforms determining whether an expansion of a symbol type is cyclic.Another methodology disclosed herein performs determining whether anexpansion of a symbol type is incomplete.

Another methodology disclosed herein performs inferring that two typesare similar by replacing concrete types with symbol types. This isachieved by applying the concrete_type_of relationship in reverse and toinfer that two different representations of a particular symbol (e.g., atimestamp) both refer to the symbol type (e.g., timestamp).

Another methodology disclosed herein performs determining when it issafe to delete a data type, or when is it safe to update a data typewithout incrementing its version number.

Another methodology disclosed herein performs detecting when acollection of data types is recursive. This methodology is usefulbecause handle modeling languages, such as XSD or JSON Schema, mayinclude recursion detection algorithms. However, before these can beused, the model must be loaded into memory. Models are described infiles and one model can “include” the definitions in another. So it ispossible that a modification in one file can create a recursion whenincluded by another file. When all of the model elements are in adatabase, the detection algorithm can be performed at update time,overcoming the limitations of conventional recursion detectionalgorithms.

Another methodology disclosed herein performs determining if a data typeis version-consistent. Another methodology disclosed herein performsdetermining if a data type is transitively deprecated. Anothermethodology disclosed herein performs determining if a data type isup-to-date. Another methodology disclosed herein performs transforming aconcrete type into a deployable (carrier) type.

Still further methodologies involve determining if any/all/no expansionsof a symbol type via concrete_type_of expansions are cyclic. Anothermethodology disclosed herein performs determining if any/all/noexpansions of a symbol type via concrete_type_of expansions areincomplete.

Another methodology disclosed herein performs determining if two basictypes are the same. Two basic types are the same if they use the same orequivalent built-in types (e.g., concrete types not derived fromanything that can be mapped by rep by to the same basic type) and theyhave the same constraints. These can be explicit types or they can beanonymous types (types defined in field definitions). For example, afield can be defined as a string with possible values (‘tcp’, ‘icmp’).By putting all of these constraints in a database it is possible to findthese multiple definitions.

Another methodology disclosed herein performs determining if two basictypes are similar based on their constraints. For example, one field canbe a string with possible values (‘tcp’, ‘udp’, ‘icmp’) and anothermight be a string with possible values (‘tep’, ‘udp’, ‘igmp’). SQLqueries can be written which search for types with overlapping enumsets, and a match can be found, indicating that these types refer torelated things.

Another methodology disclosed herein performs finding “attributes” (afield name that has a well-established meaning). All of the fields arein a table, along with their data types. A simple query extractsrepeated field names with their data types. A field name that often hasthe same type is a candidate for being declared a field. Fields with thesame name but a different type are flagged as problems.

Further methodologies disclosed herein provide searching capability. Forexample, a search can be conducted for all data types which are relevantto a “DNS_VNF”. This search can be restricted to the data type name,data type comment, field name, field comments, constraints, or anycombination. These search constraints can be further refined, e.g. tosearch for multiple keywords used in conjunction. Conventional textsearch capabilities, e.g. grep, do not provide this searchingfunctionality as 1) the collection of all text files which define thetypes might not be readily available and 2) regular text searchgenerally returns many false positives.

As described above, the exemplary embodiments can be in the form ofprocessor-implemented processes and devices for practicing thoseprocesses, such as a server in a regional network or cloud data center.The exemplary embodiments may be embodied as either centralized hardwareand software or distributed hardware and software. The exemplaryembodiments can also be in the form of computer program code containinginstructions embodied in tangible media, such as floppy diskettes, CDROMs, hard drives, or any other computer-readable storage medium,wherein, when the computer program code is loaded into and executed by acomputer, the computer becomes a device for practicing the exemplaryembodiments. The exemplary embodiments can also be in the form ofcomputer program code, for example, whether stored in a storage medium,loaded into and/or executed by a computer, or transmitted over sometransmission medium, loaded into and/or executed by a computer, ortransmitted over some transmission medium, such as over electricalwiring or cabling, through fiber optics, or via electromagneticradiation, wherein, when the computer program code is loaded into anexecuted by a computer, the computer becomes an device for practicingthe exemplary embodiments. When implemented on a general-purposemicroprocessor, the computer program code segments configure themicroprocessor to create specific logic circuits.

While the invention has been described with reference to exemplaryembodiments, it will be understood by those skilled in the art thatvarious changes may be made and equivalents may be substituted forelements thereof without departing from the scope of the invention. Inaddition, many modifications may be made to adapt a particular situationor material to the teachings of the invention without departing from theessential scope thereof. Therefore, it is intended that the inventionnot be limited to the particular embodiments disclosed for carrying outthis invention, but that the invention will include all embodimentsfalling within the scope of the claims. Moreover, the use of the termsfirst, second, etc., do not denote any order or importance, but ratherthe terms first, second, etc., are used to distinguish one element fromanother. Furthermore, the use of the terms a, an, etc., do not denote alimitation of quantity, but rather denote the presence of at least oneof the referenced item.

What is claimed is:
 1. At least one non-transitory computer readablemedium configured to store a data model database comprising: a pluralityof symbol data types, each of the plurality of symbol data types havingone or more symbol data fields; a plurality of concrete data types, eachof the concrete data types having one or more language-agnostic concretefields associated with each of the one or more symbol data fields, eachof the one or more language-agnostic concrete fields applying one ormore concrete constraints to each of the corresponding symbol datafields; and a plurality of carrier data types, the plurality of carrierdata types having one or more language-specific carrier fieldsassociated with each of the one or more language-agnostic concretefields, each of the one or more language-specific carrier fieldsapplying one or more carrier constraints to each of the correspondinglanguage-agnostic concrete fields.
 2. The at least one non-transitorycomputer readable medium of claim 1, the data model database furthercomprising: one or more composite data types.
 3. The at least onenon-transitory computer readable medium of claim 2, wherein the one ormore composite data types include a structure of at least one of the oneor more symbol data fields.
 4. The at least one non-transitory computerreadable medium of claim 2, wherein the one or more composite data typesincludes a container of one or more of the plurality of symbol datatypes.
 5. The at least one non-transitory computer readable medium ofclaim 1, the data model database further comprising: a mapping tabledefining a plurality of relationships, wherein the plurality ofrelationships include a symbol relationship property shared by two ofthe plurality of symbol data types, wherein the plurality ofrelationships includes a concrete relationship property shared betweentwo of the plurality of concrete data types, and wherein a carrierrelationship property is shared by two of the plurality of carrier datatypes.
 6. The at least one non-transitory computer readable medium ofclaim 1, the data model database further comprising: a plurality ofconstraint tables storing the one or more concrete constraints and theone or more carrier constraints.
 7. The at least one non-transitorycomputer readable medium of claim 1, wherein at least one of the one ormore symbol data fields, one or more language-agnostic concrete fields,and one or more language-specific carrier fields is populated with anenumerated value.
 8. A method, comprising: identifying a carrier typebased on language-specific carrier data from a first system;identifying, in a data model database, a language-agnostic concrete typeassociated with the language-specific carrier type; identifying a symboldata type associated with the language-agnostic concrete type; andconverting the language-specific carrier data to a secondlanguage-specific carrier type based on the symbol data type.
 9. Themethod of claim 8, further comprising: identifying a secondlanguage-agnostic concrete type based on the symbol data type; andidentifying the second language-specific carrier type based on thesecond language-agnostic concrete type.
 10. The method of claim 8,wherein converting the language-specific carrier data is performed usinga mapping table defining a plurality of relationships.
 11. The method ofclaim 10, wherein the mapping table includes a first relationshipdefining that the first language-specific carrier type and the secondlanguage-specific carrier type share the language-agnostic concretetype, and wherein the mapping table includes a second relationshipdefining that every value of the first language-specific carrier typeexist among values of the second language-specific carrier type.
 12. Themethod of claim 8, wherein converting the language-specific carrier datais performed using a constraint table.
 13. The method of claim 8,wherein the language-specific carrier data has an enumerated value. 14.A method, comprising: receiving a request to delete a data typecollection, wherein the data type collection is reflected in a datamodel database, and wherein the data type collection includes one ormore data types, the one or more data types including at least one of asymbol data type, a language-agnostic concrete type, or alanguage-specific carrier data type; determining a dependency chain forthe data type collection using the data model database; and indicatingthe data type collection as delete safe or not delete safe, wherein thedata type collection is marked as delete safe if the dependency chainindicates that remaining data types in the data model database do notinclude references to the one or more data types of the data typecollection, and wherein the data type collection is marked not deletesafe if the dependency chain indicates that one or more of the remainingdata types in the data model database include references to the one ormore data types of the data type collection.
 15. The method of claim 14,wherein the data type collection is a composite data field.
 16. Themethod of claim 15, wherein the composite data field is one of astructure or a container.
 17. The method of claim 15, wherein thedependency chain is determined using a mapping table including aplurality of relationships.
 18. The method of claim 17, wherein theplurality of relationships include a symbol relationship property sharedby two or more symbol data types, wherein the plurality of relationshipsincludes a concrete relationship property shared between two or moreconcrete data types, and wherein a carrier relationship property isshared by two or more carrier data types.
 19. The method of claim 17,wherein the plurality of relationships indicate one or more of theremaining data types are derived from the one or more data types. 20.The method of claim 17, wherein the plurality of relationships indicateno remaining data types are derived from the one or more data types.